自注意神经网络的动态平均场理论

Ángel Poc-López, Miguel Aguilera
{"title":"自注意神经网络的动态平均场理论","authors":"Ángel Poc-López, Miguel Aguilera","doi":"arxiv-2406.07247","DOIUrl":null,"url":null,"abstract":"Transformer-based models have demonstrated exceptional performance across\ndiverse domains, becoming the state-of-the-art solution for addressing\nsequential machine learning problems. Even though we have a general\nunderstanding of the fundamental components in the transformer architecture,\nlittle is known about how they operate or what are their expected dynamics.\nRecently, there has been an increasing interest in exploring the relationship\nbetween attention mechanisms and Hopfield networks, promising to shed light on\nthe statistical physics of transformer networks. However, to date, the\ndynamical regimes of transformer-like models have not been studied in depth. In\nthis paper, we address this gap by using methods for the study of asymmetric\nHopfield networks in nonequilibrium regimes --namely path integral methods over\ngenerating functionals, yielding dynamics governed by concurrent mean-field\nvariables. Assuming 1-bit tokens and weights, we derive analytical\napproximations for the behavior of large self-attention neural networks coupled\nto a softmax output, which become exact in the large limit size. Our findings\nreveal nontrivial dynamical phenomena, including nonequilibrium phase\ntransitions associated with chaotic bifurcations, even for very simple\nconfigurations with a few encoded features and a very short context window.\nFinally, we discuss the potential of our analytic approach to improve our\nunderstanding of the inner workings of transformer models, potentially reducing\ncomputational training costs and enhancing model interpretability.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamical Mean-Field Theory of Self-Attention Neural Networks\",\"authors\":\"Ángel Poc-López, Miguel Aguilera\",\"doi\":\"arxiv-2406.07247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer-based models have demonstrated exceptional performance across\\ndiverse domains, becoming the state-of-the-art solution for addressing\\nsequential machine learning problems. Even though we have a general\\nunderstanding of the fundamental components in the transformer architecture,\\nlittle is known about how they operate or what are their expected dynamics.\\nRecently, there has been an increasing interest in exploring the relationship\\nbetween attention mechanisms and Hopfield networks, promising to shed light on\\nthe statistical physics of transformer networks. However, to date, the\\ndynamical regimes of transformer-like models have not been studied in depth. In\\nthis paper, we address this gap by using methods for the study of asymmetric\\nHopfield networks in nonequilibrium regimes --namely path integral methods over\\ngenerating functionals, yielding dynamics governed by concurrent mean-field\\nvariables. Assuming 1-bit tokens and weights, we derive analytical\\napproximations for the behavior of large self-attention neural networks coupled\\nto a softmax output, which become exact in the large limit size. Our findings\\nreveal nontrivial dynamical phenomena, including nonequilibrium phase\\ntransitions associated with chaotic bifurcations, even for very simple\\nconfigurations with a few encoded features and a very short context window.\\nFinally, we discuss the potential of our analytic approach to improve our\\nunderstanding of the inner workings of transformer models, potentially reducing\\ncomputational training costs and enhancing model interpretability.\",\"PeriodicalId\":501066,\"journal\":{\"name\":\"arXiv - PHYS - Disordered Systems and Neural Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Disordered Systems and Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.07247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Disordered Systems and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.07247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于变压器的模型在各个领域都表现出了卓越的性能,已成为解决连续机器学习问题的最先进解决方案。最近,人们对探索注意力机制与 Hopfield 网络之间关系的兴趣与日俱增,有望揭示变压器网络的统计物理学。然而,迄今为止,人们还没有深入研究过类似变压器模型的动力学机制。在本文中,我们针对这一空白,使用非平衡态下的非对称霍普菲尔德网络研究方法--即路径积分方法,在生成函数上生成由并发均值场变量支配的动力学。假设代币和权重为 1 位,我们推导出了与软最大输出耦合的大型自注意神经网络行为的分析性近似值,这些近似值在大极限尺寸下变得精确。我们的研究结果揭示了非对称的动力学现象,包括与混沌分岔相关的非平衡相位转换,即使对于只有少量编码特征和很短上下文窗口的非常简单的配置也是如此。最后,我们讨论了我们的分析方法在改善我们对变压器模型内部工作原理的理解方面的潜力,这有可能降低计算训练成本并增强模型的可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Dynamical Mean-Field Theory of Self-Attention Neural Networks
Transformer-based models have demonstrated exceptional performance across diverse domains, becoming the state-of-the-art solution for addressing sequential machine learning problems. Even though we have a general understanding of the fundamental components in the transformer architecture, little is known about how they operate or what are their expected dynamics. Recently, there has been an increasing interest in exploring the relationship between attention mechanisms and Hopfield networks, promising to shed light on the statistical physics of transformer networks. However, to date, the dynamical regimes of transformer-like models have not been studied in depth. In this paper, we address this gap by using methods for the study of asymmetric Hopfield networks in nonequilibrium regimes --namely path integral methods over generating functionals, yielding dynamics governed by concurrent mean-field variables. Assuming 1-bit tokens and weights, we derive analytical approximations for the behavior of large self-attention neural networks coupled to a softmax output, which become exact in the large limit size. Our findings reveal nontrivial dynamical phenomena, including nonequilibrium phase transitions associated with chaotic bifurcations, even for very simple configurations with a few encoded features and a very short context window. Finally, we discuss the potential of our analytic approach to improve our understanding of the inner workings of transformer models, potentially reducing computational training costs and enhancing model interpretability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信