Robert Wallace;Aakash Bansal;Zachary Karas;Ningzhi Tang;Yu Huang;Toby Jia-Jun Li;Collin McMillan
{"title":"Programmer Visual Attention During Context-Aware Code Summarization","authors":"Robert Wallace;Aakash Bansal;Zachary Karas;Ningzhi Tang;Yu Huang;Toby Jia-Jun Li;Collin McMillan","doi":"10.1109/TSE.2025.3554990","DOIUrl":null,"url":null,"abstract":"Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. The focus of current research in modeling this programmer attention has been on using mouse cursors, keystrokes, or eye tracking equipment to map areas in a snippet of code. These approaches have traditionally only mapped attention for a single method. However, there is a knowledge gap in the literature because programming tasks such as source code summarization require programmers to use contextual knowledge that can only be found in other parts of the project, not only in a single method. To address this knowledge gap, we conducted an in-depth human study with 10 Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye tracking equipment to map the visual attention of programmers while they wrote the summaries. We also rate the quality of each summary. We found eye-gaze patterns and metrics that define common behaviors between programmer attention during context-aware code summarization. Specifically, we found that programmers need to read up to 35% fewer words (p <inline-formula><tex-math>$\\boldsymbol{ \\lt }$</tex-math></inline-formula> 0.01) over the whole session, and revisit 13% fewer words (p <inline-formula><tex-math>$ \\lt $</tex-math></inline-formula> 0.03) as they summarize each method during a session, while maintaining the quality of summaries. We also found that the amount of source code a participant looks at correlates with a higher quality summary, but this trend follows a bell-shaped curve, such that after a threshold reading more source code leads to a significant decrease (p <inline-formula><tex-math>$\\boldsymbol{ \\lt }$</tex-math></inline-formula> 0.01) in the quality of summaries. We also gathered insight into the type of methods in the project that provide the most contextual information for code summarization based on programmer attention. Specifically, we observed that programmers spent a majority of their time looking at methods inside the same class as the target method to be summarized. Surprisingly, we found that programmers spent significantly less time looking at methods in the call graph of the target method. We discuss how our empirical observations may aid future studies towards modeling programmer attention and improving context-aware automatic source code summarization.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1524-1537"},"PeriodicalIF":5.6000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10938844/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. The focus of current research in modeling this programmer attention has been on using mouse cursors, keystrokes, or eye tracking equipment to map areas in a snippet of code. These approaches have traditionally only mapped attention for a single method. However, there is a knowledge gap in the literature because programming tasks such as source code summarization require programmers to use contextual knowledge that can only be found in other parts of the project, not only in a single method. To address this knowledge gap, we conducted an in-depth human study with 10 Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye tracking equipment to map the visual attention of programmers while they wrote the summaries. We also rate the quality of each summary. We found eye-gaze patterns and metrics that define common behaviors between programmer attention during context-aware code summarization. Specifically, we found that programmers need to read up to 35% fewer words (p $\boldsymbol{ \lt }$ 0.01) over the whole session, and revisit 13% fewer words (p $ \lt $ 0.03) as they summarize each method during a session, while maintaining the quality of summaries. We also found that the amount of source code a participant looks at correlates with a higher quality summary, but this trend follows a bell-shaped curve, such that after a threshold reading more source code leads to a significant decrease (p $\boldsymbol{ \lt }$ 0.01) in the quality of summaries. We also gathered insight into the type of methods in the project that provide the most contextual information for code summarization based on programmer attention. Specifically, we observed that programmers spent a majority of their time looking at methods inside the same class as the target method to be summarized. Surprisingly, we found that programmers spent significantly less time looking at methods in the call graph of the target method. We discuss how our empirical observations may aid future studies towards modeling programmer attention and improving context-aware automatic source code summarization.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.