Overfitting and neural networks: Conjugate gradient and backpropagation

Steve Lawrence, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

237 Scopus citations

Abstract

Methods for controlling the bias/variance tradeoff typically assume that overfitting or overtraining is a global phenomenon. For multi-layer perceptron (MLP) neural networks, global parameters such as the training time (e.g. based on validation tests), network size, or the amount of weight decay are commonly used to control the bias/variance tradeoff. However, the degree of overfitting can vary significantly throughout the input space of the model. We show that overselection of the degrees of freedom for an MLP trained with backpropagation can improve the approximation in regions of underfitting, while not significantly overfitting in other regions. This can be a significant advantage over other models. Furthermore, we show that `better' learning algorithms such as conjugate gradient can in fact lead to worse generalization, because they can be more prone to creating varying degrees of overfitting in different regions of the input space. While experimental results cannot cover all practical situations, our results do help to explain common behavior that does not agree with theoretical expectations. Our results suggest one important reason for the relative success of MLPs, bring into question common beliefs about neural network training regarding training algorithms, overfitting, and optimal network size, suggest alternate guidelines for practical use (in terms of the training algorithm and network size selection), and help to direct future work (e.g. regarding the importance of the MLP/BP training bias, the possibility of worse performance for `better' training algorithms, local `smoothness' criteria, and further investigation of localized overfitting).

Original languageEnglish (US)
Title of host publicationProceedings of the International Joint Conference on Neural Networks
PublisherIEEE
Pages114-119
Number of pages6
Volume1
StatePublished - 2000
EventInternational Joint Conference on Neural Networks (IJCNN'2000) - Como, Italy
Duration: Jul 24 2000Jul 27 2000

Other

OtherInternational Joint Conference on Neural Networks (IJCNN'2000)
CityComo, Italy
Period7/24/007/27/00

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'Overfitting and neural networks: Conjugate gradient and backpropagation'. Together they form a unique fingerprint.

Cite this