TY - JOUR
T1 - The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity
AU - Wu, Tongle
AU - Li, Zhize
AU - Sun, Ying
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates. We consider two settings and demonstrate that incorporating local update steps can reduce communication complexity. Specifically, for μ-strongly convex and L-smooth loss functions, we proved that local DGT achieves communication complexity O(L/μ(K+1) + δ+μ/μ(1-ρ) + ρ/(1-ρ)2 · L+δ/μ), where K is the number of additional local update, ρ measures the network connectivity and δ measures the second-order heterogeneity of the local losses. Our results reveal the tradeoff between communication and computation and show increasing K can effectively reduce communication costs when the data heterogeneity is low and the network is well-connected. We then consider the over-parameterization regime where the local losses share the same minimums. We proved that employing local updates in DGD, even without gradient correction, achieves exact linear convergence under the Polyak-Łojasiewicz (PL) condition, which can yield a similar effect as DGT in reducing communication complexity. Customization of the result to linear models is further provided, with improved rate expression. Numerical experiments validate our theoretical results.
AB - We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates. We consider two settings and demonstrate that incorporating local update steps can reduce communication complexity. Specifically, for μ-strongly convex and L-smooth loss functions, we proved that local DGT achieves communication complexity O(L/μ(K+1) + δ+μ/μ(1-ρ) + ρ/(1-ρ)2 · L+δ/μ), where K is the number of additional local update, ρ measures the network connectivity and δ measures the second-order heterogeneity of the local losses. Our results reveal the tradeoff between communication and computation and show increasing K can effectively reduce communication costs when the data heterogeneity is low and the network is well-connected. We then consider the over-parameterization regime where the local losses share the same minimums. We proved that employing local updates in DGD, even without gradient correction, achieves exact linear convergence under the Polyak-Łojasiewicz (PL) condition, which can yield a similar effect as DGT in reducing communication complexity. Customization of the result to linear models is further provided, with improved rate expression. Numerical experiments validate our theoretical results.
UR - https://www.scopus.com/pages/publications/85216710496
UR - https://www.scopus.com/inward/citedby.url?scp=85216710496&partnerID=8YFLogxK
U2 - 10.1109/TSP.2025.3533208
DO - 10.1109/TSP.2025.3533208
M3 - Article
AN - SCOPUS:85216710496
SN - 1053-587X
VL - 73
SP - 751
EP - 765
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
ER -