Abstract
The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity. 2) Regardless of size, nets learn task subcomponents in similar sequence. Big nets pass through stages similar to those learned by smaller nets. Early stopping can stop training the large net when it generalizes comparably to a smaller net. We also show that conjugate gradient can yield worse generalization because it overfits regions of low non-linearity when learning to fit regions of high non-linearity.
Original language | English (US) |
---|---|
Title of host publication | Advances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000 |
Publisher | Neural information processing systems foundation |
ISBN (Print) | 0262122413, 9780262122412 |
State | Published - 2001 |
Event | 14th Annual Neural Information Processing Systems Conference, NIPS 2000 - Denver, CO, United States Duration: Nov 27 2000 → Dec 2 2000 |
Other
Other | 14th Annual Neural Information Processing Systems Conference, NIPS 2000 |
---|---|
Country/Territory | United States |
City | Denver, CO |
Period | 11/27/00 → 12/2/00 |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing