Abstract
We present an analytical study of gradient descent algorithms applied to a classification problem in machine learning based on artificial neural networks. Our approach is based on entropy–entropy dissipation estimates that yield explicit rates. Specifically, as long as the neural nets remain within a set of “good classifiers” we establish a striking feature of the algorithm: it mathematically diverges as the number of gradient descent iterations (“time”) goes to infinity but this divergence is only logarithmic, while the loss function vanishes polynomially. As a consequence, this algorithm still yields a classifier that exhibits good numerical performance and may even appear to converge.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 395-405 |
| Number of pages | 11 |
| Journal | Comptes Rendus Mathematique |
| Volume | 356 |
| Issue number | 4 |
| DOIs | |
| State | Published - Apr 2018 |
All Science Journal Classification (ASJC) codes
- General Mathematics
Fingerprint
Dive into the research topics of 'On the convergence of formally diverging neural net-based classifiers'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver