TY - GEN
T1 - Beyond backprop
T2 - 36th International Conference on Machine Learning, ICML 2019
AU - Choromanska, Anna
AU - Cowen, Benjamin
AU - Kumaravel, Sadhana
AU - Luss, Ronny
AU - Rigotti, Mattia
AU - Rish, Irina
AU - Kingsbury, Brian
AU - DiAchille, Paolo
AU - Gurev, Viatcheslav
AU - Tejwani, Ravi
AU - Bouneffouf, Djallel
N1 - Publisher Copyright:
© 2019 by the Author(S).
PY - 2019
Y1 - 2019
N2 - Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets.
AB - Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets.
UR - http://www.scopus.com/inward/record.url?scp=85078054310&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078054310&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85078054310
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 2041
EP - 2050
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
Y2 - 9 June 2019 through 15 June 2019
ER -