The Information of Large Language Model Geometry

Zhiquan Tan, Chenghai Li, Weiran Huang
{"title":"The Information of Large Language Model Geometry","authors":"Zhiquan Tan, Chenghai Li, Weiran Huang","doi":"arxiv-2402.03471","DOIUrl":null,"url":null,"abstract":"This paper investigates the information encoded in the embeddings of large\nlanguage models (LLMs). We conduct simulations to analyze the representation\nentropy and discover a power law relationship with model sizes. Building upon\nthis observation, we propose a theory based on (conditional) entropy to\nelucidate the scaling law phenomenon. Furthermore, we delve into the\nauto-regressive structure of LLMs and examine the relationship between the last\ntoken and previous context tokens using information theory and regression\ntechniques. Specifically, we establish a theoretical connection between the\ninformation gain of new tokens and ridge regression. Additionally, we explore\nthe effectiveness of Lasso regression in selecting meaningful tokens, which\nsometimes outperforms the closely related attention weights. Finally, we\nconduct controlled experiments, and find that information is distributed across\ntokens, rather than being concentrated in specific \"meaningful\" tokens alone.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"127 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.03471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper investigates the information encoded in the embeddings of large language models (LLMs). We conduct simulations to analyze the representation entropy and discover a power law relationship with model sizes. Building upon this observation, we propose a theory based on (conditional) entropy to elucidate the scaling law phenomenon. Furthermore, we delve into the auto-regressive structure of LLMs and examine the relationship between the last token and previous context tokens using information theory and regression techniques. Specifically, we establish a theoretical connection between the information gain of new tokens and ridge regression. Additionally, we explore the effectiveness of Lasso regression in selecting meaningful tokens, which sometimes outperforms the closely related attention weights. Finally, we conduct controlled experiments, and find that information is distributed across tokens, rather than being concentrated in specific "meaningful" tokens alone.
大语言模型几何的信息
本文研究了大型语言模型(LLMs)的嵌入中编码的信息。我们通过模拟来分析表征熵,发现它与模型大小之间存在幂律关系。基于这一观察结果,我们提出了一种基于(条件)熵的理论来解释缩放定律现象。此外,我们还深入研究了 LLM 的自回归结构,并利用信息论和回归技术研究了最后一个标记与之前上下文标记之间的关系。具体来说,我们在新标记的信息增益和脊回归之间建立了理论联系。此外,我们还探索了 Lasso 回归在选择有意义标记方面的有效性,其效果有时优于密切相关的注意力权重。最后,我们进行了对照实验,发现信息是分布在各个词块上的,而不是仅仅集中在特定的 "有意义 "词块上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信