用双层 ReLU 神经网络进行可证明的多任务表征学习

Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai
{"title":"用双层 ReLU 神经网络进行可证明的多任务表征学习","authors":"Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a <i>single</i> task or (ii) they are <i>linear</i>, very little is known about the closer-to-practice case of <i>nonlinear</i> NNs trained on <i>multiple</i> tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an <math><mi>r</mi></math> -dimensional subspace within the <math><mi>d</mi> <mo>≫</mo> <mi>r</mi></math> -dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of <math><mi>d</mi></math> . In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all <math><mi>r</mi></math> ground-truth features.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"9292-9345"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486479/pdf/","citationCount":"0","resultStr":"{\"title\":\"Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks.\",\"authors\":\"Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a <i>single</i> task or (ii) they are <i>linear</i>, very little is known about the closer-to-practice case of <i>nonlinear</i> NNs trained on <i>multiple</i> tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an <math><mi>r</mi></math> -dimensional subspace within the <math><mi>d</mi> <mo>≫</mo> <mi>r</mi></math> -dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of <math><mi>d</mi></math> . In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all <math><mi>r</mi></math> ground-truth features.</p>\",\"PeriodicalId\":74504,\"journal\":{\"name\":\"Proceedings of machine learning research\",\"volume\":\"235 \",\"pages\":\"9292-9345\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486479/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of machine learning research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of machine learning research","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

一种日益流行的机器学习范式是在许多任务上离线预训练神经网络(NN),然后使其适应下游任务,通常只重新训练网络的最后一层线性层。这种方法在各种情况下都能产生强大的下游性能,证明多任务预训练能带来有效的特征学习。尽管最近的一些理论研究表明,浅层网络在以下两种情况下都能学习到有意义的特征:(i) 在单一任务中训练;(ii) 是线性的,但对于在多个任务中训练的非线性网络这种更贴近实践的情况却知之甚少。在这项研究中,我们首次证明了在多个任务中使用非线性模型进行训练时会出现特征学习。我们的主要见解是,多任务预训练会产生一种伪对比损失,这种损失有利于将通常在不同任务中具有相同标签的点对齐的表征。利用这一观察结果,我们证明,当任务是二元分类任务时,标签取决于数据在 d ≫ r -dimensional 输入空间内的 r -dimensional 子空间上的投影,在双层 ReLU NN 上的基于梯度的简单多任务学习算法可以恢复这一投影,从而在样本和神经元复杂度与 d 无关的情况下泛化到下游任务。与此相反,我们的研究表明,在单个任务的高概率抽取中,对该单个任务的训练无法保证学习到所有 r 个地面真实特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks.

An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a single task or (ii) they are linear, very little is known about the closer-to-practice case of nonlinear NNs trained on multiple tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an r -dimensional subspace within the d r -dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of d . In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all r ground-truth features.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信