Single-nodal spontaneous symmetry breaking in NLP models

IF 3.1 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY

Physica A: Statistical Mechanics and its Applications Pub Date : 2026-04-15 Epub Date: 2026-02-26 DOI:10.1016/j.physa.2026.131426

Shalom Rosner , Ronit D. Gross , Ella Koresh , Ido Kanter

{"title":"Single-nodal spontaneous symmetry breaking in NLP models","authors":"Shalom Rosner , Ronit D. Gross , Ella Koresh , Ido Kanter","doi":"10.1016/j.physa.2026.131426","DOIUrl":null,"url":null,"abstract":"<div><div>Spontaneous symmetry breaking in statistical mechanics primarily occurs during phase transitions at the thermodynamic limit where the Hamiltonian preserves inversion symmetry, yet the low-temperature free energy exhibits reduced symmetry. Herein, we demonstrate the emergence of spontaneous symmetry breaking in natural language processing (NLP) models during both pre-training and fine-tuning, even under deterministic dynamics and within a finite training architecture. This phenomenon occurs at the level of individual attention heads and is scaled-down to its small subset of nodes and also valid at a single-nodal level, where nodes acquire the capacity to learn a limited set of tokens after pre-training or labels after fine-tuning for a specific classification task. As the number of nodes increases, a crossover in learning ability occurs, governed by the tradeoff between a decrease following random-guess among increased possible outputs, and enhancement following nodal cooperation, which exceeds the sum of individual nodal capabilities. In contrast to spin-glass systems, where a microscopic state of frozen spins cannot be directly linked to the free-energy minimization goal, each nodal function in this framework contributes explicitly to the global network task and can be upper-bounded using convex hull analysis. Results are demonstrated using BERT-6 architecture pre-trained on Wikipedia dataset and fine-tuned on the FewRel classification task.</div></div>","PeriodicalId":20152,"journal":{"name":"Physica A: Statistical Mechanics and its Applications","volume":"688 ","pages":"Article 131426"},"PeriodicalIF":3.1000,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physica A: Statistical Mechanics and its Applications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378437126001627","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Spontaneous symmetry breaking in statistical mechanics primarily occurs during phase transitions at the thermodynamic limit where the Hamiltonian preserves inversion symmetry, yet the low-temperature free energy exhibits reduced symmetry. Herein, we demonstrate the emergence of spontaneous symmetry breaking in natural language processing (NLP) models during both pre-training and fine-tuning, even under deterministic dynamics and within a finite training architecture. This phenomenon occurs at the level of individual attention heads and is scaled-down to its small subset of nodes and also valid at a single-nodal level, where nodes acquire the capacity to learn a limited set of tokens after pre-training or labels after fine-tuning for a specific classification task. As the number of nodes increases, a crossover in learning ability occurs, governed by the tradeoff between a decrease following random-guess among increased possible outputs, and enhancement following nodal cooperation, which exceeds the sum of individual nodal capabilities. In contrast to spin-glass systems, where a microscopic state of frozen spins cannot be directly linked to the free-energy minimization goal, each nodal function in this framework contributes explicitly to the global network task and can be upper-bounded using convex hull analysis. Results are demonstrated using BERT-6 architecture pre-trained on Wikipedia dataset and fine-tuned on the FewRel classification task.

查看原文本刊更多论文

NLP模型中的单节点自发对称性破缺

统计力学中的自发对称性破缺主要发生在热力学极限的相变中，此时哈密顿量保持了反演对称性，而低温自由能则表现出对称性的降低。在此，我们证明了在预训练和微调期间自然语言处理（NLP）模型中自发对称性破缺的出现，即使在确定性动态和有限训练架构下也是如此。这种现象发生在单个注意头的级别上，并被缩小到其较小的节点子集，并且在单节点级别上也是有效的，其中节点在针对特定分类任务进行预训练或微调后获得学习有限标记集的能力。随着节点数量的增加，学习能力发生了交叉，这取决于在增加的可能输出中随机猜测后的减少和节点合作后的增强之间的权衡，这超过了单个节点能力的总和。与自旋玻璃系统相反，在自旋玻璃系统中，冻结自旋的微观状态不能直接与自由能最小化目标联系起来，该框架中的每个节点函数都明确地为全局网络任务做出贡献，并且可以使用凸包分析进行上界。结果使用在Wikipedia数据集上预训练的BERT-6架构进行演示，并在FewRel分类任务上进行微调。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Physica A: Statistical Mechanics and its Applications 物理-物理：综合

CiteScore

7.20

自引率

9.10%

发文量

852

审稿时长

6.6 months

期刊介绍： Physica A: Statistical Mechanics and its Applications Recognized by the European Physical Society Physica A publishes research in the field of statistical mechanics and its applications. Statistical mechanics sets out to explain the behaviour of macroscopic systems by studying the statistical properties of their microscopic constituents. Applications of the techniques of statistical mechanics are widespread, and include: applications to physical systems such as solids, liquids and gases; applications to chemical and biological systems (colloids, interfaces, complex fluids, polymers and biopolymers, cell physics); and other interdisciplinary applications to for instance biological, economical and sociological systems.