TY - JOUR
T1 - A Variable Sample-Size Stochastic Quasi-Newton Method for Smooth and Nonsmooth Stochastic Convex Optimization
AU - Jalilzadeh, Afrooz
AU - Nedić, Angelia
AU - Shanbhag, Uday V.
AU - Yousefian, Farzad
N1 - Publisher Copyright:
© 2021 INFORMS.
PY - 2022/2
Y1 - 2022/2
N2 - Classical theory for quasi-Newton schemes has focused on smooth, deterministic, unconstrained optimization, whereas recent forays into stochastic convex optimization have largely resided in smooth, unconstrained, and strongly convex regimes. Naturally, there is a compelling need to address nonsmoothness, the lack of strong convexity, and the presence of constraints. Accordingly, this paper presents a quasi-Newton framework that can process merely convex and possibly nonsmooth (but smoothable) stochastic convex problems. We propose a framework that combines iterative smoothing and regularization with a variance-reduced scheme reliant on using an increasing sample size of gradients. We make the following contributions. (i) We develop a regularized and smoothed variable sample-size BFGS update (rsL-BFGS) that generates a sequence of Hessian approximations and can accommodate nonsmooth convex objectives by utilizing iterative regularization and smoothing. (ii) In strongly convex regimes with state-dependent noise, the proposed variable sample-size stochastic quasi-Newton (VS-SQN) scheme admits a nonasymptotic linear rate of convergence, whereas the oracle complexity of computing an ϵ-solution is O(κm+1/ϵ), where κ denotes the condition number and m ≥ 1. In nonsmooth (but smoothable) regimes, using Moreau smoothing retains the linear convergence rate for the resulting smoothed VS-SQN (or sVS-SQN) scheme. Notably, the nonsmooth regime allows for accommodating convex constraints. To contend with the possible unavailability of Lipschitzian and strong convexity parameters, we also provide sublinear rates for diminishing step-length variants that do not rely on the knowledge of such parameters. (iii) In merely convex but smooth settings, the regularized VS-SQN scheme rVS-SQN displays a rate of O(1/k(1-ϵ)) with an oracle complexity of O(1/ϵ3). When the smoothness requirements are weakened, the rate for the regularized and smoothed VS-SQN scheme rsVS-SQN worsens toO(k-1/3). Such statements allowfor a state-dependent noise assumption under a quadratic growth property on the objective. To the best of our knowledge, the rate results are among the first available rates for QN methods in nonsmooth regimes. Preliminary numerical evidence suggests that the schemes compare well with accelerated gradient counterparts on selected problems in stochastic optimization and machine learning with significant benefits in ill-conditioned regimes.
AB - Classical theory for quasi-Newton schemes has focused on smooth, deterministic, unconstrained optimization, whereas recent forays into stochastic convex optimization have largely resided in smooth, unconstrained, and strongly convex regimes. Naturally, there is a compelling need to address nonsmoothness, the lack of strong convexity, and the presence of constraints. Accordingly, this paper presents a quasi-Newton framework that can process merely convex and possibly nonsmooth (but smoothable) stochastic convex problems. We propose a framework that combines iterative smoothing and regularization with a variance-reduced scheme reliant on using an increasing sample size of gradients. We make the following contributions. (i) We develop a regularized and smoothed variable sample-size BFGS update (rsL-BFGS) that generates a sequence of Hessian approximations and can accommodate nonsmooth convex objectives by utilizing iterative regularization and smoothing. (ii) In strongly convex regimes with state-dependent noise, the proposed variable sample-size stochastic quasi-Newton (VS-SQN) scheme admits a nonasymptotic linear rate of convergence, whereas the oracle complexity of computing an ϵ-solution is O(κm+1/ϵ), where κ denotes the condition number and m ≥ 1. In nonsmooth (but smoothable) regimes, using Moreau smoothing retains the linear convergence rate for the resulting smoothed VS-SQN (or sVS-SQN) scheme. Notably, the nonsmooth regime allows for accommodating convex constraints. To contend with the possible unavailability of Lipschitzian and strong convexity parameters, we also provide sublinear rates for diminishing step-length variants that do not rely on the knowledge of such parameters. (iii) In merely convex but smooth settings, the regularized VS-SQN scheme rVS-SQN displays a rate of O(1/k(1-ϵ)) with an oracle complexity of O(1/ϵ3). When the smoothness requirements are weakened, the rate for the regularized and smoothed VS-SQN scheme rsVS-SQN worsens toO(k-1/3). Such statements allowfor a state-dependent noise assumption under a quadratic growth property on the objective. To the best of our knowledge, the rate results are among the first available rates for QN methods in nonsmooth regimes. Preliminary numerical evidence suggests that the schemes compare well with accelerated gradient counterparts on selected problems in stochastic optimization and machine learning with significant benefits in ill-conditioned regimes.
UR - http://www.scopus.com/inward/record.url?scp=85125601122&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125601122&partnerID=8YFLogxK
U2 - 10.1287/moor.2021.1147
DO - 10.1287/moor.2021.1147
M3 - Article
AN - SCOPUS:85125601122
SN - 0364-765X
VL - 47
SP - 690
EP - 719
JO - Mathematics of Operations Research
JF - Mathematics of Operations Research
IS - 1
ER -