Context-aware prompting for LLM-based program repair

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-04-18 DOI:10.1007/s10515-025-00512-w

Yingling Li, Muxin Cai, Junjie Chen, Yang Xu, Lei Huang, Jianping Li

{"title":"Context-aware prompting for LLM-based program repair","authors":"Yingling Li, Muxin Cai, Junjie Chen, Yang Xu, Lei Huang, Jianping Li","doi":"10.1007/s10515-025-00512-w","DOIUrl":null,"url":null,"abstract":"<div><p>Automated program repair (APR) plays a crucial role in ensuring the quality of software code, as manual bug-fixing is extremely time-consuming and labor-intensive. Traditional APR tools (e.g., template-based approaches) face the challenge of generalizing to different bug patterns, while deep learning (DL)-based methods heavily rely on training datasets and struggle to fix unseen bugs. Recently, large language models (LLMs) have shown great potential in APR due to their ability to generate patches, having achieved promising results. However, their effectiveness is still constrained by the casually-determined context (e.g., being unable to adaptively select the specific context according to the situation of each defect). Therefore, a more effective APR approach is highly needed, which provides more precise and comprehensive context for the given defect to enhance the robustness of LLM-based APRs. In this paper, we propose a context-aware APR approach named <b>CodeCorrector</b>, which designs a Chain-of-Thought (CoT) approach to follow developers’ program repair behaviors. Given a failing test and its buggy file, CodeCorrector first analyzes why the test fails based on the failure message to infer repair direction; then selects the relevant context information to this repair direction; finally builds the context-aware repair prompt to guide LLMs for patch generation. Our motivation is to offer a novel perspective for enhancing LLM-based program repair through context-aware prompting, which adaptively selects specific context for a given defect. The evaluation on the widely-used Defects4J (i.e., v1.2 and v2.0) benchmark shows that overall, by executing a small number of repairs (i.e., as few as ten rounds), CodeCorrector outperforms all the state-of-the-art baselines on the more complex defects in Defects4J v2.0 and the defects without fine-grained defect localization information in Defects4J v1.2. Specifically, a total of 38 defects are fixed by only CodeCorrector. We further analyze the contributions of two core components (i.e., repair directions, global context selection) to the performance of CodeCorrector, especially repair directions, which improve CodeCorrector by 112% in correct patches and 78% in plausible patches on Defects4J v1.2. Moreover, CodeCorrector generates more valid and correct patches, achieving a 377% improvement over the base LLM GPT-3.5 and a 268% improvement over GPT-4.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00512-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Automated program repair (APR) plays a crucial role in ensuring the quality of software code, as manual bug-fixing is extremely time-consuming and labor-intensive. Traditional APR tools (e.g., template-based approaches) face the challenge of generalizing to different bug patterns, while deep learning (DL)-based methods heavily rely on training datasets and struggle to fix unseen bugs. Recently, large language models (LLMs) have shown great potential in APR due to their ability to generate patches, having achieved promising results. However, their effectiveness is still constrained by the casually-determined context (e.g., being unable to adaptively select the specific context according to the situation of each defect). Therefore, a more effective APR approach is highly needed, which provides more precise and comprehensive context for the given defect to enhance the robustness of LLM-based APRs. In this paper, we propose a context-aware APR approach named CodeCorrector, which designs a Chain-of-Thought (CoT) approach to follow developers’ program repair behaviors. Given a failing test and its buggy file, CodeCorrector first analyzes why the test fails based on the failure message to infer repair direction; then selects the relevant context information to this repair direction; finally builds the context-aware repair prompt to guide LLMs for patch generation. Our motivation is to offer a novel perspective for enhancing LLM-based program repair through context-aware prompting, which adaptively selects specific context for a given defect. The evaluation on the widely-used Defects4J (i.e., v1.2 and v2.0) benchmark shows that overall, by executing a small number of repairs (i.e., as few as ten rounds), CodeCorrector outperforms all the state-of-the-art baselines on the more complex defects in Defects4J v2.0 and the defects without fine-grained defect localization information in Defects4J v1.2. Specifically, a total of 38 defects are fixed by only CodeCorrector. We further analyze the contributions of two core components (i.e., repair directions, global context selection) to the performance of CodeCorrector, especially repair directions, which improve CodeCorrector by 112% in correct patches and 78% in plausible patches on Defects4J v1.2. Moreover, CodeCorrector generates more valid and correct patches, achieving a 377% improvement over the base LLM GPT-3.5 and a 268% improvement over GPT-4.

Abstract Image

查看原文本刊更多论文

基于llm的程序修复的上下文感知提示

自动程序修复（APR）在确保软件代码质量方面起着至关重要的作用，因为手动修复错误非常耗时和费力。传统的APR工具（例如，基于模板的方法）面临着泛化到不同错误模式的挑战，而基于深度学习（DL）的方法严重依赖于训练数据集，难以修复看不见的错误。近年来，大型语言模型（large language models, llm）由于具有生成补丁的能力，在APR中显示出了巨大的潜力，并取得了可喜的成果。然而，它们的有效性仍然受到随意确定的上下文的约束（例如，不能根据每个缺陷的情况自适应地选择特定的上下文）。因此，迫切需要一种更有效的APR方法，它为给定缺陷提供更精确和全面的上下文，以增强基于llm的APR的鲁棒性。在本文中，我们提出了一种名为codecortor的上下文感知APR方法，它设计了一种思想链（CoT）方法来跟踪开发人员的程序修复行为。给定一个失败的测试及其错误文件，codecortor首先根据失败消息分析测试失败的原因，以推断修复方向；然后选择与该修复方向相关的上下文信息；最后构建上下文感知的修复提示，以指导llm生成补丁。我们的动机是提供一种新的视角，通过上下文感知提示来增强基于llm的程序修复，它自适应地为给定的缺陷选择特定的上下文。对广泛使用的缺陷4j（即，v1.2和v2.0）基准的评估表明，总体而言，通过执行少量的修复（即，少至10轮），codecortor在缺陷4j v2.0中更复杂的缺陷和缺陷v1.2中没有细粒度缺陷定位信息的缺陷上的性能优于所有最先进的基线。具体来说，总共有38个缺陷被codecortor修复了。我们进一步分析了两个核心组件（即修复方向，全局上下文选择）对codecortor性能的贡献，特别是修复方向，在缺陷4j v1.2上，修复方向在正确补丁中提高了112%，在合理补丁中提高了78%。此外，codecortor生成更有效和正确的补丁，比基本的LLM GPT-3.5提高了377%，比GPT-4提高了268%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.