Abstract
The paper considers normalized gradient descent (NGD), a natural modification of classical gradient descent (GD) in optimization problems. It is shown that, contrary to GD, NGD escapes saddle points 'quickly.' A serious shortcoming of GD in nonconvex problems is that it can take arbitrarily long to escape from the neighborhood of a saddle point. In practice, this issue can significantly slow the convergence of GD, particularly in high-dimensional nonconvex problems. The paper focuses on continuous-time dynamics. It is shown that 1) NGD 'almost never' converges to saddle points and 2) the time required for NGD to escape from a ball of radius r about a saddle point x$ is at most 5κr, where κ is the condition number of the Hessian of f at x. As a simple application of these results, a global convergence-time bound is established for NGD under mild assumptions.
Original language | English (US) |
---|---|
Article number | 8706530 |
Pages (from-to) | 4818-4824 |
Number of pages | 7 |
Journal | IEEE Transactions on Automatic Control |
Volume | 64 |
Issue number | 11 |
DOIs | |
State | Published - Nov 2019 |
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering