TY - JOUR
T1 - Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization
T2 - block-specific steplengths and adapted batch-sizes
AU - Lei, Jinlong
AU - Shanbhag, Uday V.
N1 - Publisher Copyright:
© 2020 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2022
Y1 - 2022
N2 - This work considers the minimization of a sum of an expectation-valued coordinate-wise smooth nonconvex function and a nonsmooth block-separable convex regularizer. We propose an asynchronous variance-reduced algorithm, where in each iteration, a single block is randomly chosen to update its estimates by a proximal variable sample-size stochastic gradient scheme, while the remaining blocks are kept invariant. Notably, each block employs a steplength relying on its block-specific Lipschitz constant while batch-sizes are updated as a function of the number of times that block is selected. We show that every limit point is a stationary point and establish the ergodic non-asymptotic rate (Formula presented.). Iteration and oracle complexity to obtain an ε-stationary point are shown to be (Formula presented.) and (Formula presented.), respectively. Furthermore, under a proximal Polyak–Łojasiewicz condition with batch sizes increasing at a geometric rate, we prove that the suboptimality diminishes at a geometric rate, the optimal deterministic rate while iteration and oracle complexity to obtain an ε-optimal solution are (Formula presented.) and (Formula presented.) with (Formula presented.). In the single block setting, we obtain the optimal oracle complexity (Formula presented.). Finally, preliminary numerics suggest that the schemes compare well with competitors reliant on global Lipschitz constants.
AB - This work considers the minimization of a sum of an expectation-valued coordinate-wise smooth nonconvex function and a nonsmooth block-separable convex regularizer. We propose an asynchronous variance-reduced algorithm, where in each iteration, a single block is randomly chosen to update its estimates by a proximal variable sample-size stochastic gradient scheme, while the remaining blocks are kept invariant. Notably, each block employs a steplength relying on its block-specific Lipschitz constant while batch-sizes are updated as a function of the number of times that block is selected. We show that every limit point is a stationary point and establish the ergodic non-asymptotic rate (Formula presented.). Iteration and oracle complexity to obtain an ε-stationary point are shown to be (Formula presented.) and (Formula presented.), respectively. Furthermore, under a proximal Polyak–Łojasiewicz condition with batch sizes increasing at a geometric rate, we prove that the suboptimality diminishes at a geometric rate, the optimal deterministic rate while iteration and oracle complexity to obtain an ε-optimal solution are (Formula presented.) and (Formula presented.) with (Formula presented.). In the single block setting, we obtain the optimal oracle complexity (Formula presented.). Finally, preliminary numerics suggest that the schemes compare well with competitors reliant on global Lipschitz constants.
UR - http://www.scopus.com/inward/record.url?scp=85083551489&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083551489&partnerID=8YFLogxK
U2 - 10.1080/10556788.2020.1746963
DO - 10.1080/10556788.2020.1746963
M3 - Article
AN - SCOPUS:85083551489
SN - 1055-6788
VL - 37
SP - 264
EP - 294
JO - Optimization Methods and Software
JF - Optimization Methods and Software
IS - 1
ER -