Programmer Visual Attention During Context-Aware Code Summarization

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2025-03-26 DOI:10.1109/TSE.2025.3554990

Robert Wallace;Aakash Bansal;Zachary Karas;Ningzhi Tang;Yu Huang;Toby Jia-Jun Li;Collin McMillan

{"title":"Programmer Visual Attention During Context-Aware Code Summarization","authors":"Robert Wallace;Aakash Bansal;Zachary Karas;Ningzhi Tang;Yu Huang;Toby Jia-Jun Li;Collin McMillan","doi":"10.1109/TSE.2025.3554990","DOIUrl":null,"url":null,"abstract":"Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. The focus of current research in modeling this programmer attention has been on using mouse cursors, keystrokes, or eye tracking equipment to map areas in a snippet of code. These approaches have traditionally only mapped attention for a single method. However, there is a knowledge gap in the literature because programming tasks such as source code summarization require programmers to use contextual knowledge that can only be found in other parts of the project, not only in a single method. To address this knowledge gap, we conducted an in-depth human study with 10 Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye tracking equipment to map the visual attention of programmers while they wrote the summaries. We also rate the quality of each summary. We found eye-gaze patterns and metrics that define common behaviors between programmer attention during context-aware code summarization. Specifically, we found that programmers need to read up to 35% fewer words (p <inline-formula><tex-math>$\\boldsymbol{ \\lt }$</tex-math></inline-formula> 0.01) over the whole session, and revisit 13% fewer words (p <inline-formula><tex-math>$ \\lt $</tex-math></inline-formula> 0.03) as they summarize each method during a session, while maintaining the quality of summaries. We also found that the amount of source code a participant looks at correlates with a higher quality summary, but this trend follows a bell-shaped curve, such that after a threshold reading more source code leads to a significant decrease (p <inline-formula><tex-math>$\\boldsymbol{ \\lt }$</tex-math></inline-formula> 0.01) in the quality of summaries. We also gathered insight into the type of methods in the project that provide the most contextual information for code summarization based on programmer attention. Specifically, we observed that programmers spent a majority of their time looking at methods inside the same class as the target method to be summarized. Surprisingly, we found that programmers spent significantly less time looking at methods in the call graph of the target method. We discuss how our empirical observations may aid future studies towards modeling programmer attention and improving context-aware automatic source code summarization.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1524-1537"},"PeriodicalIF":5.6000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10938844/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. The focus of current research in modeling this programmer attention has been on using mouse cursors, keystrokes, or eye tracking equipment to map areas in a snippet of code. These approaches have traditionally only mapped attention for a single method. However, there is a knowledge gap in the literature because programming tasks such as source code summarization require programmers to use contextual knowledge that can only be found in other parts of the project, not only in a single method. To address this knowledge gap, we conducted an in-depth human study with 10 Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye tracking equipment to map the visual attention of programmers while they wrote the summaries. We also rate the quality of each summary. We found eye-gaze patterns and metrics that define common behaviors between programmer attention during context-aware code summarization. Specifically, we found that programmers need to read up to 35% fewer words (p

$\boldsymbol{ \lt }$

0.01) over the whole session, and revisit 13% fewer words (p

$ \lt $

0.03) as they summarize each method during a session, while maintaining the quality of summaries. We also found that the amount of source code a participant looks at correlates with a higher quality summary, but this trend follows a bell-shaped curve, such that after a threshold reading more source code leads to a significant decrease (p

$\boldsymbol{ \lt }$

0.01) in the quality of summaries. We also gathered insight into the type of methods in the project that provide the most contextual information for code summarization based on programmer attention. Specifically, we observed that programmers spent a majority of their time looking at methods inside the same class as the target method to be summarized. Surprisingly, we found that programmers spent significantly less time looking at methods in the call graph of the target method. We discuss how our empirical observations may aid future studies towards modeling programmer attention and improving context-aware automatic source code summarization.

查看原文本刊更多论文

程序员在上下文感知代码总结过程中的视觉注意力

程序员注意力表示程序员在执行编程任务时对源代码部分的视觉关注。目前对程序员注意力建模的研究重点是使用鼠标光标、击键或眼动追踪设备来映射代码片段中的区域。这些方法传统上只将注意力映射到单一方法上。然而，文献中存在知识缺口，因为编程任务（如源代码摘要）要求程序员使用只能在项目的其他部分中找到的上下文知识，而不仅仅是在单个方法中。为了解决这个知识差距，我们对10个Java程序员进行了深入的人类研究，每个程序员在5个1小时的会议中为5个大型Java项目中的40种方法生成摘要。我们使用眼动追踪设备来绘制程序员在编写摘要时的视觉注意力。我们还对每个摘要的质量进行评估。我们发现了眼睛注视模式和指标，它们定义了在上下文感知的代码总结过程中程序员注意之间的共同行为。具体来说，我们发现程序员在整个会话中需要阅读的单词最多减少35% (p $\boldsymbol{\lt}$ 0.01)，并且在会话期间总结每种方法时需要重新访问的单词减少13% (p $\ lt $ 0.03)，同时保持总结的质量。我们还发现，参与者查看源代码的数量与更高质量的摘要相关，但这种趋势遵循钟形曲线，因此，在阈值之后阅读更多的源代码会导致摘要质量的显着下降（p $\boldsymbol{\lt}$ 0.01）。我们还深入了解了项目中的方法类型，这些方法为基于程序员注意力的代码总结提供了最相关的信息。具体地说，我们观察到程序员花费了大部分时间查看与要总结的目标方法相同的类中的方法。令人惊讶的是，我们发现程序员花在目标方法调用图中的方法上的时间要少得多。我们讨论了我们的经验观察如何有助于未来对程序员注意力建模和改进上下文感知的自动源代码摘要的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.