Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Mingchuan Yang, Bo Tang, Feiyu Xiong, Zhiyu Li
{"title":"Attention heads of large language models.","authors":"Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Mingchuan Yang, Bo Tang, Feiyu Xiong, Zhiyu Li","doi":"10.1016/j.patter.2025.101176","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs) have demonstrated performance approaching human levels in tasks such as long-text comprehension and mathematical reasoning, but they remain black-box systems. Understanding the reasoning bottlenecks of LLMs remains a critical challenge, as these limitations are deeply tied to their internal architecture. Attention heads play a pivotal role in reasoning and are thought to share similarities with human brain functions. In this review, we explore the roles and mechanisms of attention heads to help demystify the internal reasoning processes of LLMs. We first introduce a four-stage framework inspired by the human thought process. Using this framework, we review existing research to identify and categorize the functions of specific attention heads. Additionally, we analyze the experimental methodologies used to discover these special heads and further summarize relevant evaluation methods and benchmarks. Finally, we discuss the limitations of current research and propose several potential future directions.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"6 2","pages":"101176"},"PeriodicalIF":6.7000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873009/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patterns","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.patter.2025.101176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/14 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Large language models (LLMs) have demonstrated performance approaching human levels in tasks such as long-text comprehension and mathematical reasoning, but they remain black-box systems. Understanding the reasoning bottlenecks of LLMs remains a critical challenge, as these limitations are deeply tied to their internal architecture. Attention heads play a pivotal role in reasoning and are thought to share similarities with human brain functions. In this review, we explore the roles and mechanisms of attention heads to help demystify the internal reasoning processes of LLMs. We first introduce a four-stage framework inspired by the human thought process. Using this framework, we review existing research to identify and categorize the functions of specific attention heads. Additionally, we analyze the experimental methodologies used to discover these special heads and further summarize relevant evaluation methods and benchmarks. Finally, we discuss the limitations of current research and propose several potential future directions.