On the Generalization Ability of Unsupervised Pretraining.

Proceedings of machine learning research Pub Date : 2024-05-01

Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi

{"title":"On the Generalization Ability of Unsupervised Pretraining.","authors":"Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and fine-tuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"238 ","pages":"4519-4527"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484219/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of machine learning research","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and fine-tuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.

本刊更多论文

论无监督预训练的泛化能力

无监督学习的最新进展表明，无监督预训练后再进行微调，可以提高模型的泛化能力。然而，对于在无标签数据集上学习到的表征函数如何影响微调模型的泛化，还缺乏严格的理解。现有的理论研究没有充分考虑到预训练和微调阶段的分布和任务的异质性。为了弥补这一不足，本文提出了一个新颖的理论框架，阐明了影响无监督预训练期间所获知识向后续微调阶段转移的关键因素，这些因素最终会影响微调模型对下游任务的泛化能力。我们应用我们的理论框架来分析两种不同场景的泛化约束：使用深度神经网络进行上下文编码器预训练，以及使用深度转换器进行掩码自动编码器预训练，然后在二元分类任务上进行微调。最后，受我们研究结果的启发，我们提出了一种新颖的预训练正则化方法，以进一步增强微调模型的泛化能力。总之，我们的研究结果有助于更好地理解无监督预训练和微调范式，并为设计更有效的预训练算法提供启示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of machine learning research

自引率

0.00%

发文量