关于梯度流高斯多指标模型的学习，第一部分：一般性质和双时间尺度学习

IF 2.7 1区数学 Q1 MATHEMATICS

Communications on Pure and Applied Mathematics Pub Date : 2025-07-15 DOI:10.1002/cpa.70006

Alberto Bietti, Joan Bruna, Loucas Pillaud-Vivien

{"title":"关于梯度流高斯多指标模型的学习，第一部分：一般性质和双时间尺度学习","authors":"Alberto Bietti, Joan Bruna, Loucas Pillaud-Vivien","doi":"10.1002/cpa.70006","DOIUrl":null,"url":null,"abstract":"<p>We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection. By appropriately exploiting the matrix semigroup structure arising over the subspace correlation matrices, we establish global convergence of the resulting Grassmannian gradient flow dynamics, and provide a quantitative description of its associated “saddle-to-saddle” dynamics. Notably, the timescales associated with each saddle can be explicitly characterized in terms of an appropriate Hermite decomposition of the target link function.</p>","PeriodicalId":10601,"journal":{"name":"Communications on Pure and Applied Mathematics","volume":"78 12","pages":"2354-2435"},"PeriodicalIF":2.7000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On learning Gaussian multi-index models with gradient flow part I: General properties and two-timescale learning\",\"authors\":\"Alberto Bietti, Joan Bruna, Loucas Pillaud-Vivien\",\"doi\":\"10.1002/cpa.70006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection. By appropriately exploiting the matrix semigroup structure arising over the subspace correlation matrices, we establish global convergence of the resulting Grassmannian gradient flow dynamics, and provide a quantitative description of its associated “saddle-to-saddle” dynamics. Notably, the timescales associated with each saddle can be explicitly characterized in terms of an appropriate Hermite decomposition of the target link function.</p>\",\"PeriodicalId\":10601,\"journal\":{\"name\":\"Communications on Pure and Applied Mathematics\",\"volume\":\"78 12\",\"pages\":\"2354-2435\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications on Pure and Applied Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpa.70006\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications on Pure and Applied Mathematics","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpa.70006","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了高维高斯数据的多指标回归问题的梯度流。多指标函数由一个未知的低秩线性投影和一个任意未知的低维链接函数组成。因此，它们构成了神经网络特征学习的自然模板。我们考虑了一种双时间尺度算法，通过非参数模型学习低维链接函数比子空间参数化低秩投影要快得多。通过适当地利用子空间相关矩阵上产生的矩阵半群结构，我们建立了所得到的格拉斯曼梯度流动动力学的全局收敛性，并提供了其相关的“鞍到鞍”动力学的定量描述。值得注意的是，与每个鞍座相关的时间尺度可以根据目标链接函数的适当Hermite分解来明确表征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

On learning Gaussian multi-index models with gradient flow part I: General properties and two-timescale learning

查看原文本刊更多论文

On learning Gaussian multi-index models with gradient flow part I: General properties and two-timescale learning

We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection. By appropriately exploiting the matrix semigroup structure arising over the subspace correlation matrices, we establish global convergence of the resulting Grassmannian gradient flow dynamics, and provide a quantitative description of its associated “saddle-to-saddle” dynamics. Notably, the timescales associated with each saddle can be explicitly characterized in terms of an appropriate Hermite decomposition of the target link function.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Communications on Pure and Applied Mathematics 数学-数学

CiteScore

6.70

自引率

3.30%

发文量

审稿时长

>12 weeks

期刊介绍： Communications on Pure and Applied Mathematics (ISSN 0010-3640) is published monthly, one volume per year, by John Wiley & Sons, Inc. © 2019. The journal primarily publishes papers originating at or solicited by the Courant Institute of Mathematical Sciences. It features recent developments in applied mathematics, mathematical physics, and mathematical analysis. The topics include partial differential equations, computer science, and applied mathematics. CPAM is devoted to mathematical contributions to the sciences; both theoretical and applied papers, of original or expository type, are included.