TY - GEN
T1 - Compositional Stochastic Average Gradient for Machine Learning and Related Applications
AU - Hsieh, Tsung Yu
AU - EL-Manzalawy, Yasser
AU - Sun, Yiwei
AU - Honavar, Vasant
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - Many machine learning, and statistical inference problems require minimization of a composition of expected value functions (CEVF). Of particular interest is the finite-sum versions of such compositional optimization problems (FS-CEVF). Compositional stochastic variance reduced gradient (C-SVRG) methods that combine stochastic compositional gradient descent (SCGD) and stochastic variance reduced gradient descent (SVRG) methods are the state-of-the-art methods for FS-CEVF problems. We introduce compositional stochastic average gradient descent (C-SAG) a novel extension of the stochastic average gradient method (SAG) to minimize composition of finite-sum functions. C-SAG, like SAG, estimates gradient by incorporating memory of previous gradient information. We present theoretical analyses of C-SAG which show that C-SAG, like C-SVRG, achieves a linear convergence rate for strongly convex objective function; However, C-CAG achieves lower oracle query complexity per iteration than C-SVRG. Finally, we present results of experiments showing that C-SAG converges substantially faster than full gradient (FG), as well as C-SVRG.
AB - Many machine learning, and statistical inference problems require minimization of a composition of expected value functions (CEVF). Of particular interest is the finite-sum versions of such compositional optimization problems (FS-CEVF). Compositional stochastic variance reduced gradient (C-SVRG) methods that combine stochastic compositional gradient descent (SCGD) and stochastic variance reduced gradient descent (SVRG) methods are the state-of-the-art methods for FS-CEVF problems. We introduce compositional stochastic average gradient descent (C-SAG) a novel extension of the stochastic average gradient method (SAG) to minimize composition of finite-sum functions. C-SAG, like SAG, estimates gradient by incorporating memory of previous gradient information. We present theoretical analyses of C-SAG which show that C-SAG, like C-SVRG, achieves a linear convergence rate for strongly convex objective function; However, C-CAG achieves lower oracle query complexity per iteration than C-SVRG. Finally, we present results of experiments showing that C-SAG converges substantially faster than full gradient (FG), as well as C-SVRG.
UR - http://www.scopus.com/inward/record.url?scp=85057092729&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057092729&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-03493-1_77
DO - 10.1007/978-3-030-03493-1_77
M3 - Conference contribution
AN - SCOPUS:85057092729
SN - 9783030034924
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 740
EP - 752
BT - Intelligent Data Engineering and Automated Learning – IDEAL 2018 - 19th International Conference, Proceedings
A2 - Yin, Hujun
A2 - Novais, Paulo
A2 - Camacho, David
A2 - Tallón-Ballesteros, Antonio J.
PB - Springer Verlag
T2 - 19th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2018
Y2 - 21 November 2018 through 23 November 2018
ER -