Yingling Li, Muxin Cai, Junjie Chen, Yang Xu, Lei Huang, Jianping Li
{"title":"Context-aware prompting for LLM-based program repair","authors":"Yingling Li, Muxin Cai, Junjie Chen, Yang Xu, Lei Huang, Jianping Li","doi":"10.1007/s10515-025-00512-w","DOIUrl":null,"url":null,"abstract":"<div><p>Automated program repair (APR) plays a crucial role in ensuring the quality of software code, as manual bug-fixing is extremely time-consuming and labor-intensive. Traditional APR tools (e.g., template-based approaches) face the challenge of generalizing to different bug patterns, while deep learning (DL)-based methods heavily rely on training datasets and struggle to fix unseen bugs. Recently, large language models (LLMs) have shown great potential in APR due to their ability to generate patches, having achieved promising results. However, their effectiveness is still constrained by the casually-determined context (e.g., being unable to adaptively select the specific context according to the situation of each defect). Therefore, a more effective APR approach is highly needed, which provides more precise and comprehensive context for the given defect to enhance the robustness of LLM-based APRs. In this paper, we propose a context-aware APR approach named <b>CodeCorrector</b>, which designs a Chain-of-Thought (CoT) approach to follow developers’ program repair behaviors. Given a failing test and its buggy file, CodeCorrector first analyzes why the test fails based on the failure message to infer repair direction; then selects the relevant context information to this repair direction; finally builds the context-aware repair prompt to guide LLMs for patch generation. Our motivation is to offer a novel perspective for enhancing LLM-based program repair through context-aware prompting, which adaptively selects specific context for a given defect. The evaluation on the widely-used Defects4J (i.e., v1.2 and v2.0) benchmark shows that overall, by executing a small number of repairs (i.e., as few as ten rounds), CodeCorrector outperforms all the state-of-the-art baselines on the more complex defects in Defects4J v2.0 and the defects without fine-grained defect localization information in Defects4J v1.2. Specifically, a total of 38 defects are fixed by only CodeCorrector. We further analyze the contributions of two core components (i.e., repair directions, global context selection) to the performance of CodeCorrector, especially repair directions, which improve CodeCorrector by 112% in correct patches and 78% in plausible patches on Defects4J v1.2. Moreover, CodeCorrector generates more valid and correct patches, achieving a 377% improvement over the base LLM GPT-3.5 and a 268% improvement over GPT-4.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00512-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Automated program repair (APR) plays a crucial role in ensuring the quality of software code, as manual bug-fixing is extremely time-consuming and labor-intensive. Traditional APR tools (e.g., template-based approaches) face the challenge of generalizing to different bug patterns, while deep learning (DL)-based methods heavily rely on training datasets and struggle to fix unseen bugs. Recently, large language models (LLMs) have shown great potential in APR due to their ability to generate patches, having achieved promising results. However, their effectiveness is still constrained by the casually-determined context (e.g., being unable to adaptively select the specific context according to the situation of each defect). Therefore, a more effective APR approach is highly needed, which provides more precise and comprehensive context for the given defect to enhance the robustness of LLM-based APRs. In this paper, we propose a context-aware APR approach named CodeCorrector, which designs a Chain-of-Thought (CoT) approach to follow developers’ program repair behaviors. Given a failing test and its buggy file, CodeCorrector first analyzes why the test fails based on the failure message to infer repair direction; then selects the relevant context information to this repair direction; finally builds the context-aware repair prompt to guide LLMs for patch generation. Our motivation is to offer a novel perspective for enhancing LLM-based program repair through context-aware prompting, which adaptively selects specific context for a given defect. The evaluation on the widely-used Defects4J (i.e., v1.2 and v2.0) benchmark shows that overall, by executing a small number of repairs (i.e., as few as ten rounds), CodeCorrector outperforms all the state-of-the-art baselines on the more complex defects in Defects4J v2.0 and the defects without fine-grained defect localization information in Defects4J v1.2. Specifically, a total of 38 defects are fixed by only CodeCorrector. We further analyze the contributions of two core components (i.e., repair directions, global context selection) to the performance of CodeCorrector, especially repair directions, which improve CodeCorrector by 112% in correct patches and 78% in plausible patches on Defects4J v1.2. Moreover, CodeCorrector generates more valid and correct patches, achieving a 377% improvement over the base LLM GPT-3.5 and a 268% improvement over GPT-4.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.