Attention heads of large language models.

IF 6.7 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Patterns Pub Date : 2025-02-06 eCollection Date: 2025-02-14 DOI:10.1016/j.patter.2025.101176
Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Mingchuan Yang, Bo Tang, Feiyu Xiong, Zhiyu Li
{"title":"Attention heads of large language models.","authors":"Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Mingchuan Yang, Bo Tang, Feiyu Xiong, Zhiyu Li","doi":"10.1016/j.patter.2025.101176","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs) have demonstrated performance approaching human levels in tasks such as long-text comprehension and mathematical reasoning, but they remain black-box systems. Understanding the reasoning bottlenecks of LLMs remains a critical challenge, as these limitations are deeply tied to their internal architecture. Attention heads play a pivotal role in reasoning and are thought to share similarities with human brain functions. In this review, we explore the roles and mechanisms of attention heads to help demystify the internal reasoning processes of LLMs. We first introduce a four-stage framework inspired by the human thought process. Using this framework, we review existing research to identify and categorize the functions of specific attention heads. Additionally, we analyze the experimental methodologies used to discover these special heads and further summarize relevant evaluation methods and benchmarks. Finally, we discuss the limitations of current research and propose several potential future directions.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 2","pages":"101176"},"PeriodicalIF":6.7000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873009/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patterns","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.patter.2025.101176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/14 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) have demonstrated performance approaching human levels in tasks such as long-text comprehension and mathematical reasoning, but they remain black-box systems. Understanding the reasoning bottlenecks of LLMs remains a critical challenge, as these limitations are deeply tied to their internal architecture. Attention heads play a pivotal role in reasoning and are thought to share similarities with human brain functions. In this review, we explore the roles and mechanisms of attention heads to help demystify the internal reasoning processes of LLMs. We first introduce a four-stage framework inspired by the human thought process. Using this framework, we review existing research to identify and categorize the functions of specific attention heads. Additionally, we analyze the experimental methodologies used to discover these special heads and further summarize relevant evaluation methods and benchmarks. Finally, we discuss the limitations of current research and propose several potential future directions.

大型语言模型的注意头。
大型语言模型(llm)在长文本理解和数学推理等任务中表现出接近人类水平的性能,但它们仍然是黑盒系统。理解法学硕士的推理瓶颈仍然是一个关键的挑战,因为这些限制与法学硕士的内部架构密切相关。注意头在推理中起着关键作用,被认为与人类的大脑功能有相似之处。在这篇综述中,我们探讨了注意头的作用和机制,以帮助揭开法学硕士内部推理过程的神秘面纱。我们首先介绍一个受人类思维过程启发的四阶段框架。利用这一框架,我们回顾了现有的研究,以识别和分类特定注意头的功能。此外,我们还分析了发现这些特殊头部的实验方法,并进一步总结了相关的评估方法和基准。最后,我们讨论了当前研究的局限性,并提出了几个潜在的未来发展方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Patterns
Patterns Decision Sciences-Decision Sciences (all)
CiteScore
10.60
自引率
4.60%
发文量
153
审稿时长
19 weeks
期刊介绍:
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信