TY - GEN
T1 - Trading redundancy for communication
T2 - 36th International Conference on Machine Learning, ICML 2019
AU - Haddadpour, Farzin
AU - Kamani, Mohammad Mahdi
AU - Mahdavi, Mehrdad
AU - Cadambe, Viveck R.
N1 - Publisher Copyright:
Copyright 2019 by the author(s).
PY - 2019
Y1 - 2019
N2 - Communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms to train large neural networks. In recent years, there has been a great deal of research to alleviate communication cost by compressing the gradient vector or using local updates and periodic model averaging. In this paper, we advocate the use of redundancy towards communication-efficient distributed stochastic algorithms for non-convex optimization. In particular, we, both theoretically and practically, show that by properly infusing redundancy to the training data with model averaging, it is possible to significantly reduce the number of communication rounds. To be more precise, we show that redundancy reduces residual error in local averaging, thereby reaching the same level of accuracy with fewer rounds of communication as compared with previous algorithms. Empirical studies on CIFAR10, CIFAR100 and ImageNet datasets in a distributed environment complement our theoretical results; they show that our algorithms have additional beneficial aspects including tolerance to failures, as well as greater gradient diversity.
AB - Communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms to train large neural networks. In recent years, there has been a great deal of research to alleviate communication cost by compressing the gradient vector or using local updates and periodic model averaging. In this paper, we advocate the use of redundancy towards communication-efficient distributed stochastic algorithms for non-convex optimization. In particular, we, both theoretically and practically, show that by properly infusing redundancy to the training data with model averaging, it is possible to significantly reduce the number of communication rounds. To be more precise, we show that redundancy reduces residual error in local averaging, thereby reaching the same level of accuracy with fewer rounds of communication as compared with previous algorithms. Empirical studies on CIFAR10, CIFAR100 and ImageNet datasets in a distributed environment complement our theoretical results; they show that our algorithms have additional beneficial aspects including tolerance to failures, as well as greater gradient diversity.
UR - http://www.scopus.com/inward/record.url?scp=85078093040&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078093040&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85078093040
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 4497
EP - 4527
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
Y2 - 9 June 2019 through 15 June 2019
ER -