TY - JOUR
T1 - Stochastic compositional gradient descent
T2 - algorithms for minimizing compositions of expected-value functions
AU - Wang, Mengdi
AU - Fang, Ethan X.
AU - Liu, Han
N1 - Publisher Copyright:
© 2016, Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society.
PY - 2017/1/1
Y1 - 2017/1/1
N2 - Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem min xEv[fv(Ew[ gw(x)]) ]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of fv, gw and use an auxiliary variable to track the unknown quantity Ew[gw(x) ]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of O(k-1 / 4) in the general case and O(k-2 / 3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k-2 / 7) in the general case and O(k-4 / 5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.
AB - Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem min xEv[fv(Ew[ gw(x)]) ]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of fv, gw and use an auxiliary variable to track the unknown quantity Ew[gw(x) ]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of O(k-1 / 4) in the general case and O(k-2 / 3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k-2 / 7) in the general case and O(k-4 / 5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.
UR - http://www.scopus.com/inward/record.url?scp=84966270105&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84966270105&partnerID=8YFLogxK
U2 - 10.1007/s10107-016-1017-3
DO - 10.1007/s10107-016-1017-3
M3 - Article
AN - SCOPUS:84966270105
SN - 0025-5610
VL - 161
SP - 419
EP - 449
JO - Mathematical Programming
JF - Mathematical Programming
IS - 1-2
ER -