{"title":"Critical Phase Transition in a Large Language Model","authors":"Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima","doi":"arxiv-2406.05335","DOIUrl":null,"url":null,"abstract":"The performance of large language models (LLMs) strongly depends on the\n\\textit{temperature} parameter. Empirically, at very low temperatures, LLMs\ngenerate sentences with clear repetitive structures, while at very high\ntemperatures, generated sentences are often incomprehensible. In this study,\nusing GPT-2, we numerically demonstrate that the difference between the two\nregimes is not just a smooth change but a phase transition with singular,\ndivergent statistical quantities. Our extensive analysis shows that critical\nbehaviors, such as a power-law decay of correlation in a text, emerge in the\nLLM at the transition temperature as well as in a natural language dataset. We\nalso discuss that several statistical quantities characterizing the criticality\nshould be useful to evaluate the performance of LLMs.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Disordered Systems and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.05335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The performance of large language models (LLMs) strongly depends on the
\textit{temperature} parameter. Empirically, at very low temperatures, LLMs
generate sentences with clear repetitive structures, while at very high
temperatures, generated sentences are often incomprehensible. In this study,
using GPT-2, we numerically demonstrate that the difference between the two
regimes is not just a smooth change but a phase transition with singular,
divergent statistical quantities. Our extensive analysis shows that critical
behaviors, such as a power-law decay of correlation in a text, emerge in the
LLM at the transition temperature as well as in a natural language dataset. We
also discuss that several statistical quantities characterizing the criticality
should be useful to evaluate the performance of LLMs.