Abstract
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and finetuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 4519-4527 |
| Number of pages | 9 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 238 |
| State | Published - 2024 |
| Event | 27th International Conference on Artificial Intelligence and Statistics, AISTATS 2024 - Valencia, Spain Duration: May 2 2024 → May 4 2024 |
All Science Journal Classification (ASJC) codes
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'On the Generalization Ability of Unsupervised Pretraining'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver