Glassy dynamics near the interpolation transition in deep recurrent networks.

IF 2.2 3区 物理与天体物理 Q2 PHYSICS, FLUIDS & PLASMAS
John Hertz, Joanna Tyrcha
{"title":"Glassy dynamics near the interpolation transition in deep recurrent networks.","authors":"John Hertz, Joanna Tyrcha","doi":"10.1103/PhysRevE.111.055307","DOIUrl":null,"url":null,"abstract":"<p><p>We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from overparametrized networks, known as the interpolation transition. The training data are Bach chorales in four-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning, approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to 1/(w-w_{c}) as the width w approaches a (loss-dependent) critical value w_{c}. We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time τ_{w}, we measure the subsequent mean-square fluctuations of the weight values at times τ>τ_{w}. In the underparametrized phase, we find that they are well-described by a single function of τ/τ_{w}. While this scaling holds approximately at short times at the transition and in the overparametrized phase, it breaks down at longer times when the training loss gets close to the lower limit imposed by the stochastic gradient descent dynamics. Both this kind of aging and the critical slowing down are also found in certain spin glass models, suggesting that those models contain the most essential features of the learning dynamics.</p>","PeriodicalId":48698,"journal":{"name":"Physical Review E","volume":"111 5-2","pages":"055307"},"PeriodicalIF":2.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review E","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1103/PhysRevE.111.055307","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, FLUIDS & PLASMAS","Score":null,"Total":0}
引用次数: 0

Abstract

We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from overparametrized networks, known as the interpolation transition. The training data are Bach chorales in four-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning, approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to 1/(w-w_{c}) as the width w approaches a (loss-dependent) critical value w_{c}. We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time τ_{w}, we measure the subsequent mean-square fluctuations of the weight values at times τ>τ_{w}. In the underparametrized phase, we find that they are well-described by a single function of τ/τ_{w}. While this scaling holds approximately at short times at the transition and in the overparametrized phase, it breaks down at longer times when the training loss gets close to the lower limit imposed by the stochastic gradient descent dynamics. Both this kind of aging and the critical slowing down are also found in certain spin glass models, suggesting that those models contain the most essential features of the learning dynamics.

深度循环网络插值过渡附近的玻璃动力学。
我们研究了深度循环网络中的学习动态,重点关注深度-宽度平面边界附近的行为,该平面将欠参数化网络与过度参数化网络分开,称为插值过渡。训练数据为四声部和声巴赫曲谱,采用交叉熵损失函数的随机梯度下降法进行学习。我们发现学习的临界减速,接近从过参数化侧的过渡:对于给定的网络深度,当宽度w接近一个(损失相关的)临界值w_{c}时,达到小训练损失值的学习时间似乎成比例地偏离1/(w-w_{c})。我们用插值跃迁确定了该值的零损耗极限。我们还研究了衰老(随着学习开始时间的增加,波动的减缓)。取一个学习时间为τ_{w}的系统,我们测量在τ>τ_{w}时刻权值的后续均方波动。在欠参数化阶段,我们发现它们可以用τ/τ_{w}的单一函数很好地描述。虽然这种缩放在过渡和过参数化阶段的短时间内保持近似,但当训练损失接近随机梯度下降动力学所施加的下限时,它会在较长时间内失效。在某些自旋玻璃模型中也发现了这种老化和临界减速,这表明这些模型包含了学习动力学的最基本特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Physical Review E
Physical Review E PHYSICS, FLUIDS & PLASMASPHYSICS, MATHEMAT-PHYSICS, MATHEMATICAL
CiteScore
4.50
自引率
16.70%
发文量
2110
期刊介绍: Physical Review E (PRE), broad and interdisciplinary in scope, focuses on collective phenomena of many-body systems, with statistical physics and nonlinear dynamics as the central themes of the journal. Physical Review E publishes recent developments in biological and soft matter physics including granular materials, colloids, complex fluids, liquid crystals, and polymers. The journal covers fluid dynamics and plasma physics and includes sections on computational and interdisciplinary physics, for example, complex networks.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信