Critical Phase Transition in a Large Language Model

Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima
{"title":"Critical Phase Transition in a Large Language Model","authors":"Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima","doi":"arxiv-2406.05335","DOIUrl":null,"url":null,"abstract":"The performance of large language models (LLMs) strongly depends on the\n\\textit{temperature} parameter. Empirically, at very low temperatures, LLMs\ngenerate sentences with clear repetitive structures, while at very high\ntemperatures, generated sentences are often incomprehensible. In this study,\nusing GPT-2, we numerically demonstrate that the difference between the two\nregimes is not just a smooth change but a phase transition with singular,\ndivergent statistical quantities. Our extensive analysis shows that critical\nbehaviors, such as a power-law decay of correlation in a text, emerge in the\nLLM at the transition temperature as well as in a natural language dataset. We\nalso discuss that several statistical quantities characterizing the criticality\nshould be useful to evaluate the performance of LLMs.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Disordered Systems and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.05335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The performance of large language models (LLMs) strongly depends on the \textit{temperature} parameter. Empirically, at very low temperatures, LLMs generate sentences with clear repetitive structures, while at very high temperatures, generated sentences are often incomprehensible. In this study, using GPT-2, we numerically demonstrate that the difference between the two regimes is not just a smooth change but a phase transition with singular, divergent statistical quantities. Our extensive analysis shows that critical behaviors, such as a power-law decay of correlation in a text, emerge in the LLM at the transition temperature as well as in a natural language dataset. We also discuss that several statistical quantities characterizing the criticality should be useful to evaluate the performance of LLMs.
大型语言模型中的临界相变
大型语言模型(LLMs)的性能在很大程度上取决于(textit{temperature})参数。根据经验,在极低的温度下,大语言模型生成的句子具有清晰的重复结构,而在极高的温度下,生成的句子往往难以理解。在本研究中,我们使用 GPT-2 用数值证明了这两种状态之间的差异不仅仅是平滑的变化,而是具有奇异、发散统计量的相变。我们的大量分析表明,在过渡温度下,LLM 和自然语言数据集中都出现了临界行为,如文本中相关性的幂律衰减。我们还讨论了表征临界值的几个统计量,它们应该有助于评估 LLM 的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信