{"title":"深度循环网络插值过渡附近的玻璃动力学。","authors":"John Hertz, Joanna Tyrcha","doi":"10.1103/PhysRevE.111.055307","DOIUrl":null,"url":null,"abstract":"<p><p>We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from overparametrized networks, known as the interpolation transition. The training data are Bach chorales in four-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning, approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to 1/(w-w_{c}) as the width w approaches a (loss-dependent) critical value w_{c}. We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time τ_{w}, we measure the subsequent mean-square fluctuations of the weight values at times τ>τ_{w}. In the underparametrized phase, we find that they are well-described by a single function of τ/τ_{w}. While this scaling holds approximately at short times at the transition and in the overparametrized phase, it breaks down at longer times when the training loss gets close to the lower limit imposed by the stochastic gradient descent dynamics. Both this kind of aging and the critical slowing down are also found in certain spin glass models, suggesting that those models contain the most essential features of the learning dynamics.</p>","PeriodicalId":48698,"journal":{"name":"Physical Review E","volume":"111 5-2","pages":"055307"},"PeriodicalIF":2.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Glassy dynamics near the interpolation transition in deep recurrent networks.\",\"authors\":\"John Hertz, Joanna Tyrcha\",\"doi\":\"10.1103/PhysRevE.111.055307\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from overparametrized networks, known as the interpolation transition. The training data are Bach chorales in four-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning, approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to 1/(w-w_{c}) as the width w approaches a (loss-dependent) critical value w_{c}. We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time τ_{w}, we measure the subsequent mean-square fluctuations of the weight values at times τ>τ_{w}. In the underparametrized phase, we find that they are well-described by a single function of τ/τ_{w}. While this scaling holds approximately at short times at the transition and in the overparametrized phase, it breaks down at longer times when the training loss gets close to the lower limit imposed by the stochastic gradient descent dynamics. Both this kind of aging and the critical slowing down are also found in certain spin glass models, suggesting that those models contain the most essential features of the learning dynamics.</p>\",\"PeriodicalId\":48698,\"journal\":{\"name\":\"Physical Review E\",\"volume\":\"111 5-2\",\"pages\":\"055307\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physical Review E\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1103/PhysRevE.111.055307\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, FLUIDS & PLASMAS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review E","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1103/PhysRevE.111.055307","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, FLUIDS & PLASMAS","Score":null,"Total":0}
Glassy dynamics near the interpolation transition in deep recurrent networks.
We examine learning dynamics in deep recurrent networks, focusing on the behavior near the boundary in the depth-width plane separating under- from overparametrized networks, known as the interpolation transition. The training data are Bach chorales in four-part harmony, and the learning is by stochastic gradient descent with a cross-entropy loss function. We find critical slowing down of the learning, approaching the transition from the overparametrized side: For a given network depth, learning times to reach small training loss values appear to diverge proportional to 1/(w-w_{c}) as the width w approaches a (loss-dependent) critical value w_{c}. We identify the zero-loss limit of this value with the interpolation transition. We also study aging (the slowing down of fluctuations as the time since the beginning of learning increases). Taking a system that has been learning for a time τ_{w}, we measure the subsequent mean-square fluctuations of the weight values at times τ>τ_{w}. In the underparametrized phase, we find that they are well-described by a single function of τ/τ_{w}. While this scaling holds approximately at short times at the transition and in the overparametrized phase, it breaks down at longer times when the training loss gets close to the lower limit imposed by the stochastic gradient descent dynamics. Both this kind of aging and the critical slowing down are also found in certain spin glass models, suggesting that those models contain the most essential features of the learning dynamics.
期刊介绍:
Physical Review E (PRE), broad and interdisciplinary in scope, focuses on collective phenomena of many-body systems, with statistical physics and nonlinear dynamics as the central themes of the journal. Physical Review E publishes recent developments in biological and soft matter physics including granular materials, colloids, complex fluids, liquid crystals, and polymers. The journal covers fluid dynamics and plasma physics and includes sections on computational and interdisciplinary physics, for example, complex networks.