深度循环网络插值过渡附近的玻璃动力学。

IF 2.2 3区物理与天体物理 Q2 PHYSICS, FLUIDS & PLASMAS

Physical Review E Pub Date : 2025-05-01 DOI:10.1103/PhysRevE.111.055307

John Hertz, Joanna Tyrcha

{"title":"深度循环网络插值过渡附近的玻璃动力学。","authors":"John Hertz, Joanna Tyrcha","doi":"10.1103/PhysRevE.111.055307","DOIUrl":null,"url":null,"abstract":"We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from overparametrized networks, known as the interpolation transition. The training data are Bach chorales in four-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning, approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to 1/(w-w_{c}) as the width w approaches a (loss-dependent) critical value w_{c}. We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time τ_{w}, we measure the subsequent mean-square fluctuations of the weight values at times τ>τ_{w}. In the underparametrized phase, we find that they are well-described by a single function of τ/τ_{w}. While this scaling holds approximately at short times at the transition and in the overparametrized phase, it breaks down at longer times when the training loss gets close to the lower limit imposed by the stochastic gradient descent dynamics. Both this kind of aging and the critical slowing down are also found in certain spin glass models, suggesting that those models contain the most essential features of the learning dynamics.","PeriodicalId":48698,"journal":{"name":"Physical Review E","volume":"111 5-2","pages":"055307"},"PeriodicalIF":2.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Glassy dynamics near the interpolation transition in deep recurrent networks.\",\"authors\":\"John Hertz, Joanna Tyrcha\",\"doi\":\"10.1103/PhysRevE.111.055307\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from overparametrized networks, known as the interpolation transition. The training data are Bach chorales in four-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning, approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to 1/(w-w_{c}) as the width w approaches a (loss-dependent) critical value w_{c}. We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time τ_{w}, we measure the subsequent mean-square fluctuations of the weight values at times τ>τ_{w}. In the underparametrized phase, we find that they are well-described by a single function of τ/τ_{w}. While this scaling holds approximately at short times at the transition and in the overparametrized phase, it breaks down at longer times when the training loss gets close to the lower limit imposed by the stochastic gradient descent dynamics. Both this kind of aging and the critical slowing down are also found in certain spin glass models, suggesting that those models contain the most essential features of the learning dynamics.\",\"PeriodicalId\":48698,\"journal\":{\"name\":\"Physical Review E\",\"volume\":\"111 5-2\",\"pages\":\"055307\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physical Review E\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1103/PhysRevE.111.055307\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, FLUIDS & PLASMAS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review E","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1103/PhysRevE.111.055307","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, FLUIDS & PLASMAS","Score":null,"Total":0}

引用次数: 0

摘要

我们研究了深度循环网络中的学习动态，重点关注深度-宽度平面边界附近的行为，该平面将欠参数化网络与过度参数化网络分开，称为插值过渡。训练数据为四声部和声巴赫曲谱，采用交叉熵损失函数的随机梯度下降法进行学习。我们发现学习的临界减速，接近从过参数化侧的过渡：对于给定的网络深度，当宽度w接近一个（损失相关的）临界值w_{c}时，达到小训练损失值的学习时间似乎成比例地偏离1/(w-w_{c})。我们用插值跃迁确定了该值的零损耗极限。我们还研究了衰老（随着学习开始时间的增加，波动的减缓）。取一个学习时间为τ_{w}的系统，我们测量在τ>τ_{w}时刻权值的后续均方波动。在欠参数化阶段，我们发现它们可以用τ/τ_{w}的单一函数很好地描述。虽然这种缩放在过渡和过参数化阶段的短时间内保持近似，但当训练损失接近随机梯度下降动力学所施加的下限时，它会在较长时间内失效。在某些自旋玻璃模型中也发现了这种老化和临界减速，这表明这些模型包含了学习动力学的最基本特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Glassy dynamics near the interpolation transition in deep recurrent networks.

We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from overparametrized networks, known as the interpolation transition. The training data are Bach chorales in four-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning, approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to 1/(w-w_{c}) as the width w approaches a (loss-dependent) critical value w_{c}. We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time τ_{w}, we measure the subsequent mean-square fluctuations of the weight values at times τ>τ_{w}. In the underparametrized phase, we find that they are well-described by a single function of τ/τ_{w}. While this scaling holds approximately at short times at the transition and in the overparametrized phase, it breaks down at longer times when the training loss gets close to the lower limit imposed by the stochastic gradient descent dynamics. Both this kind of aging and the critical slowing down are also found in certain spin glass models, suggesting that those models contain the most essential features of the learning dynamics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Physical Review E PHYSICS, FLUIDS & PLASMASPHYSICS, MATHEMAT-PHYSICS, MATHEMATICAL

CiteScore

4.50

自引率

16.70%

发文量

2110

期刊介绍： Physical Review E (PRE), broad and interdisciplinary in scope, focuses on collective phenomena of many-body systems, with statistical physics and nonlinear dynamics as the central themes of the journal. Physical Review E publishes recent developments in biological and soft matter physics including granular materials, colloids, complex fluids, liquid crystals, and polymers. The journal covers fluid dynamics and plasma physics and includes sections on computational and interdisciplinary physics, for example, complex networks.