TY - JOUR
T1 - Hessian-Free Distributed Bilevel Optimization Via Penalization With Time-Scale Separation
AU - Niu, Youcheng
AU - Xu, Jinming
AU - Sun, Ying
AU - Chai, Li
AU - Chen, Jiming
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - This paper considers a class of distributed bilevel optimization (DBO) problems with a coupled inner-levelsubproblem. Existing approaches typically rely on hypergradient estimations involving computationally expensive Hessian evaluation. To address this, we approximate the DBO problem as a minimax problem by properly designing a penalty term that enforces both the constraint imposed by the inner-levelsubproblem and the consensus among the decision variables of agents. Moreover, we propose a loopless distributed algorithm, AHEAD, that employs multiple-timescale updates to solve the approximate problem asymptotically without requiring Hessian computation. Theoretically, we establish sharp convergence rates for nonconvex-strongly-convex settings and for distributed minimax problems as special cases. Our analysis reveals a cleardependence of convergence performance on node heterogeneity, penalty parameters, and network connectivity, with a weaker assumption on heterogeneity that only requires bounded gradients at the optimum. Numerical experiments corroborate our theoretical results.
AB - This paper considers a class of distributed bilevel optimization (DBO) problems with a coupled inner-levelsubproblem. Existing approaches typically rely on hypergradient estimations involving computationally expensive Hessian evaluation. To address this, we approximate the DBO problem as a minimax problem by properly designing a penalty term that enforces both the constraint imposed by the inner-levelsubproblem and the consensus among the decision variables of agents. Moreover, we propose a loopless distributed algorithm, AHEAD, that employs multiple-timescale updates to solve the approximate problem asymptotically without requiring Hessian computation. Theoretically, we establish sharp convergence rates for nonconvex-strongly-convex settings and for distributed minimax problems as special cases. Our analysis reveals a cleardependence of convergence performance on node heterogeneity, penalty parameters, and network connectivity, with a weaker assumption on heterogeneity that only requires bounded gradients at the optimum. Numerical experiments corroborate our theoretical results.
UR - https://www.scopus.com/pages/publications/105027781503
UR - https://www.scopus.com/pages/publications/105027781503#tab=citedBy
U2 - 10.1109/TAC.2026.3653308
DO - 10.1109/TAC.2026.3653308
M3 - Article
AN - SCOPUS:105027781503
SN - 0018-9286
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
ER -