IEEE Transactions on Software Engineering最新文献

筛选
英文 中文
The Impact of Prompt Programming on Function-Level Code Generation 提示式编程对函数级代码生成的影响
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-07-10 DOI: 10.1109/TSE.2025.3587794
Ranim Khojah;Francisco Gomes de Oliveira Neto;Mazen Mohamad;Philipp Leitner
{"title":"The Impact of Prompt Programming on Function-Level Code Generation","authors":"Ranim Khojah;Francisco Gomes de Oliveira Neto;Mazen Mohamad;Philipp Leitner","doi":"10.1109/TSE.2025.3587794","DOIUrl":"10.1109/TSE.2025.3587794","url":null,"abstract":"Large Language Models (LLMs) are increasingly used by software engineers for code generation. However, limitations of LLMs such as irrelevant or incorrect code have highlighted the need for prompt programming (or prompt engineering) where engineers apply specific prompt techniques (e.g., chain-of-thought or input-output examples) to improve the generated code. While some prompt techniques have been studied, the impact of different techniques — and their interactions — on code generation is still not fully understood. In this study, we introduce CodePromptEval, a dataset of 7072 prompts designed to evaluate five prompt techniques (few-shot, persona, chain-of-thought, function signature, list of packages) and their effect on the correctness, similarity, and quality of complete functions generated by three LLMs (GPT-4o, Llama3, and Mistral). Our findings show that while certain prompt techniques significantly influence the generated code, combining multiple techniques does not necessarily improve the outcome. Additionally, we observed a trade-off between correctness and quality when using prompt techniques. Our dataset and replication package enable future research on improving LLM-generated code and evaluating new prompt techniques.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2381-2395"},"PeriodicalIF":5.6,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11077752","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144603464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practitioners’ Expectations on Log Anomaly Detection 从业者对日志异常检测的期望
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-07-08 DOI: 10.1109/TSE.2025.3586700
Xiaoxue Ma;Yishu Li;Jacky Keung;Xiao Yu;Huiqi Zou;Zhen Yang;Federica Sarro;Earl T. Barr
{"title":"Practitioners’ Expectations on Log Anomaly Detection","authors":"Xiaoxue Ma;Yishu Li;Jacky Keung;Xiao Yu;Huiqi Zou;Zhen Yang;Federica Sarro;Earl T. Barr","doi":"10.1109/TSE.2025.3586700","DOIUrl":"10.1109/TSE.2025.3586700","url":null,"abstract":"Log anomaly detection has become a common practice for software engineers to analyze software system behavior. Despite significant research efforts in log anomaly detection over the past decade, it remains unclear what are practitioners’ expectations on log anomaly detection and whether current research meets their needs. To fill this gap, we conduct an empirical study, surveying 312 practitioners from 36 countries about their expectations on log anomaly detection. In particular, we investigate various factors influencing practitioners’ willingness to adopt log anomaly detection tools. We then perform a literature review on log anomaly detection, focusing on publications in premier venues from 2015 to 2025, to compare practitioners’ needs with the current state of research. Based on this comparison, we highlight the directions for researchers to focus on to develop log anomaly detection techniques that better meet practitioners’ expectations.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 9","pages":"2455-2471"},"PeriodicalIF":5.6,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144594132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open Source, Hidden Costs: A Systematic Literature Review on OSS License Management 开源,隐藏的成本:关于OSS许可证管理的系统文献综述
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-07-07 DOI: 10.1109/TSE.2025.3586411
Boyuan Li;Chengwei Liu;Lingling Fan;Sen Chen;Zhenlin Zhang;Zheli Liu
{"title":"Open Source, Hidden Costs: A Systematic Literature Review on OSS License Management","authors":"Boyuan Li;Chengwei Liu;Lingling Fan;Sen Chen;Zhenlin Zhang;Zheli Liu","doi":"10.1109/TSE.2025.3586411","DOIUrl":"10.1109/TSE.2025.3586411","url":null,"abstract":"Integrating third-party software components is a common practice in modern software development, offering significant advantages in terms of efficiency and innovation. However, this practice is fraught with risks related to software licensing. A lack of understanding may lead to disputes, which can pose serious legal and operational challenges. To these ends, both academia and industry have conducted various investigations and proposed solutions and tools to deal with these challenges. However, significant limitations still remain. Moreover, the rapid evolution of open-source software (OSS) licenses, as well as the rapidly incorporated generative software engineering techniques, such as large language models for code (CodeLLMs), are placing greater demands on the systematic management of software license risks. To unveil the severe challenges and explore possible future directions, we conduct the first systematic literature review (SLR) on 80 carefully selected OSS license-related papers, classifying existing research into three key categories, i.e., license identification, license risk assessment, and license risk mitigation. Based on these, we discuss challenges in existing solutions, conclude the opportunities to shed light on future research directions and offer practical recommendations for practitioners. We hope this thorough review will help bridge the gaps between academia and industry and accelerate the ecosystem-wide governance of legitimate software risks within the software engineering community.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 9","pages":"2432-2454"},"PeriodicalIF":5.6,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144578245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Effectiveness of LLM-as-a-Judge for Code Generation and Summarization 法学硕士在代码生成与总结中的有效性研究
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-07-04 DOI: 10.1109/TSE.2025.3586082
Giuseppe Crupi;Rosalia Tufano;Alejandro Velasco;Antonio Mastropaolo;Denys Poshyvanyk;Gabriele Bavota
{"title":"On the Effectiveness of LLM-as-a-Judge for Code Generation and Summarization","authors":"Giuseppe Crupi;Rosalia Tufano;Alejandro Velasco;Antonio Mastropaolo;Denys Poshyvanyk;Gabriele Bavota","doi":"10.1109/TSE.2025.3586082","DOIUrl":"10.1109/TSE.2025.3586082","url":null,"abstract":"Large Language Models (LLMs) have been recently exploited as judges for complex natural language processing tasks, such as Q&A (Question & Answer). The basic idea is to delegate to an LLM the assessment of the “quality” of the output provided by an automated technique (often another LLM) for tasks for which: (i) quantitative metrics would only tell part of the story, and; (ii) a large-scale human-based evaluation would be too expensive. LLMs-as-a-judge, if proven effective for a specific task, can also unlock new possibilities for automation, with several LLMs proposing a solution for a given instance of the task (<i>e.g.,</i> an answer to a question) and others judging and deciding what is the best output to show the user. We study the effectiveness of LLMs-as-a-judge for two code-related tasks, namely <i>code generation</i> and <i>code summarization</i>. The rationale for choosing these tasks is two-fold. First, quantitative metrics are usually not enough for the assessment of code summarizers/generators. For example, it is well documented that metrics such as BLEU are quite weak proxies for the quality of the generated summaries. Second, even state-of-the-art techniques still struggle with handling complex instances of these tasks (<i>e.g.,</i> summarizing a quite long / complex function), making them good candidates for benefiting from more advanced solutions envisioning collaboration among LLMs. For <i>code generation</i>, we check whether eight LLMs are able to judge the correctness of 1,405 Java methods and 1,281 Python functions generated by the same LLMs or implemented by humans. For <i>code summarization</i>, we compare the judgment of five LLMs to those provided by ninehumans for <inline-formula><tex-math>$sim$</tex-math></inline-formula> 1.2k summaries, related to both Java and Python functions. Our findings show that GPT-4-turbo is the best LLM in terms of judging capabilities for both tasks, with “smaller” LLMs featuring tens of billions parameters not being able to cope with judging tasks. However, even the best-performing LLM frequently misjudges the correctness of the code and summary quality.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2329-2345"},"PeriodicalIF":5.6,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransferFuzz-Pro: Large Language Model Driven Code Debugging Technology for Verifying Propagated Vulnerability transferfuzzy - pro:用于验证传播漏洞的大语言模型驱动代码调试技术
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-07-03 DOI: 10.1109/TSE.2025.3584774
Siyuan Li;Kaiyu Xie;Yuekang Li;Hong Li;Yimo Ren;Limin Sun;Hongsong Zhu
{"title":"TransferFuzz-Pro: Large Language Model Driven Code Debugging Technology for Verifying Propagated Vulnerability","authors":"Siyuan Li;Kaiyu Xie;Yuekang Li;Hong Li;Yimo Ren;Limin Sun;Hongsong Zhu","doi":"10.1109/TSE.2025.3584774","DOIUrl":"10.1109/TSE.2025.3584774","url":null,"abstract":"Code reuse in software development frequently facilitates the spread of vulnerabilities, leading to imprecise scopes of affected software in CVE reports. Traditional methods focus primarily on detecting reused vulnerability code in target software but lack the ability to confirm whether these vulnerabilities can be triggered in new software contexts. In previous work, we introduced the TransferFuzz framework to address this gap by using historical trace-based fuzzing. However, its effectiveness is constrained by the need for manual intervention and reliance on source code instrumentation. To overcome these limitations, we propose TransferFuzz-Pro, a novel framework that integrates Large Language Model (LLM)-driven code debugging technology. By leveraging LLM for automated, human-like debugging and Proof-of-Concept (PoC) generation, combined with binary-level instrumentation, TransferFuzz-Pro extends verification capabilities to a wider range of targets. Our evaluation shows that TransferFuzz-Pro is significantly faster and can automatically validate vulnerabilities that were previously unverifiable using conventional methods. Notably, it expands the number of affected software instances for 15 CVE-listed vulnerabilities from 15 to 53 and successfully generates PoCs for various Linux distributions. These results demonstrate that TransferFuzz-Pro effectively verifies vulnerabilities introduced by code reuse in target software and automatically generation PoCs.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2396-2411"},"PeriodicalIF":5.6,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COTE: Predicting Code-to-Test Co-Evolution by Integrating Link Analysis and Pre-Trained Language Model Techniques 通过集成链接分析和预训练语言模型技术预测代码到测试的共同进化
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-27 DOI: 10.1109/TSE.2025.3583027
Yuyong Liu;Zhifei Chen;Lin Chen;Yanhui Li;Xuansong Li;Wei Song
{"title":"COTE: Predicting Code-to-Test Co-Evolution by Integrating Link Analysis and Pre-Trained Language Model Techniques","authors":"Yuyong Liu;Zhifei Chen;Lin Chen;Yanhui Li;Xuansong Li;Wei Song","doi":"10.1109/TSE.2025.3583027","DOIUrl":"10.1109/TSE.2025.3583027","url":null,"abstract":"Tests, as an essential artifact, should co-evolve with the production code to ensure that the associated production code satisfies specification. However, developers often postpone or even forget to update tests, making the tests outdated and lag behind the code. To predict which tests need to be updated when production code is changed, it is challenging to identify all related tests and determine their change probabilities due to complex change scenarios. This paper fills the gap and proposes a hybrid approach named COTE to predict code-to-test co-evolution. We first compute the linked test candidates based on different code-to-test dependencies. After that, we identify common co-change patterns by building a method-level dependence graph. For the remaining ambiguous patterns, we leverage a pre-trained language model which captures the semantic features of code and the change reasons contained in commit messages to judge one test’s likelihood of being updated. Experiments on our datasets consisting of 6,314 samples extracted from 5,000 Java projects show that COTE outperforms state-of-the-art approaches, achieving a precision of 89.0% and a recall of 71.6%. This work can help practitioners reduce test maintenance costs and improve software quality.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2232-2253"},"PeriodicalIF":5.6,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144503609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OneMoreTest: A Learning-Based Approach to Generating and Selecting Fault-Revealing Unit Tests OneMoreTest:生成和选择故障揭示单元测试的基于学习的方法
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-25 DOI: 10.1109/TSE.2025.3581556
Wei Wei;Yanjie Jiang;Yahui Li;Lu Zhang;Hui Liu
{"title":"OneMoreTest: A Learning-Based Approach to Generating and Selecting Fault-Revealing Unit Tests","authors":"Wei Wei;Yanjie Jiang;Yahui Li;Lu Zhang;Hui Liu","doi":"10.1109/TSE.2025.3581556","DOIUrl":"10.1109/TSE.2025.3581556","url":null,"abstract":"Developers often manually design a few unit tests for a given method under development. After passing such manually designed tests, however, they usually have to turn to automated test case generation tools like EvoSuite and Randoop for more thorough testing. Although the automatically generated tests may achieve a high coverage, they rarely identify hard-to-detect defects automatically because of the well-known test oracle problem: It is challenging to tell whether the output is correct or incorrect without explicit test oracle (expected output). Consequently, developers should manually select and verify a few suspicious test cases to identify hard-to-detect defects. To this end, in this paper, we propose a novel approach, called <i>OneMoreTest</i>, to generating and selecting the most suspicious tests for manual verification. Based on a manually designed passed test, <i>OneMoreTest</i> automatically generates millions of input-output pairs for the method under test (MUT) with mutation-based fuzzing. It then trains an automatically generated neural network to simulate the MUT’s behavior. For new tests automatically generated for the same MUT, <i>OneMoreTest</i> suggests developers with the top <inline-formula><tex-math>$k$</tex-math></inline-formula> most suspicious tests that have the greatest distances between their actual output and estimated output (i.e., network’s output). Our evaluation on real-world faulty methods suggests that <i>OneMoreTest</i> is accurate. On 70.79% of the involved 178 real-world faulty methods, we can identify the defects by manually verifying only a SINGLE test for each of the methods according to <i>OneMoreTest</i>’s suggestions. Compared against the state of the art, <i>OneMoreTest</i> improved the precision from 46.63% to 72.62%, and recall from 46.63% to 70.79%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2346-2365"},"PeriodicalIF":5.6,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144488766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enriching Mutation Testing With Innovative Method Invocation Mutation: Filling the Crucial Missing Piece of the Puzzle 用创新的方法调用突变丰富突变测试:填补拼图中缺失的关键部分
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-19 DOI: 10.1109/TSE.2025.3573751
Peng Zhang;Zeyu Lu;Yang Wang;Yibiao Yang;Yuming Zhou;Mike Papadakis
{"title":"Enriching Mutation Testing With Innovative Method Invocation Mutation: Filling the Crucial Missing Piece of the Puzzle","authors":"Peng Zhang;Zeyu Lu;Yang Wang;Yibiao Yang;Yuming Zhou;Mike Papadakis","doi":"10.1109/TSE.2025.3573751","DOIUrl":"10.1109/TSE.2025.3573751","url":null,"abstract":"Mutation testing aims to simulate real-world defects, but existing tools often struggle to replicate method invocation defects accurately. To address this, we propose MIN (Method INvocation mutator), which uses a mapping strategy to pair method names with corresponding values, ensuring that methods share argument and return types. This method enhances the feasibility and realism of mutants by considering factors such as library methods, access control, inheritance, and static methods. Experimental results show that integrating MIN into Major (a popular mutation tool) improves semantic similarity to real defects by 11%, increases mutant set diversity to 97.5%, and reduces undetected faults by 38.5%. Furthermore, MIN’s performance rivals that of state-of-the-art machine learning-based mutators like CodeBERT, with a 10x speed advantage over CodeBERT and 4x over DeepMutation in generating compilable mutants. These findings demonstrate that MIN can significantly enhance defect simulation and improve the efficiency of mutation testing.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2125-2143"},"PeriodicalIF":6.5,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144328538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting Generalizable Fairness With Mahalanobis Distances Guided Boltzmann Exploratory Testing 用马氏距离引导玻尔兹曼探索性检验提高可推广公平性
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-19 DOI: 10.1109/TSE.2025.3581402
Kaixiang Dong;Peng Wu;Yanting Chen
{"title":"Boosting Generalizable Fairness With Mahalanobis Distances Guided Boltzmann Exploratory Testing","authors":"Kaixiang Dong;Peng Wu;Yanting Chen","doi":"10.1109/TSE.2025.3581402","DOIUrl":"10.1109/TSE.2025.3581402","url":null,"abstract":"Although machine learning models have been remarkably effective for decision-making tasks such as employment, insurance, and criminal justice, it remains urgent yet challenging to ensure model predictions are reliable and socially fair. This amounts to detecting and repairing potential discriminatory defects of machine learning models extensively with authentic testing data. In this paper, we propose a novel Mahalanobis distance guided Adaptive Exploratory Fairness Testing (MAEFT) approach, which searches for individual discriminatory instances (IDIs) through deep reinforcement learning with an adaptive extension of Boltzmann exploration, and significantly reduces overestimation. MAEFT uses Mahalanobis distances to guide the search with realistic correlations between input features. Thus, through learning a more accurate state-action value approximation, MAEFT can touch a much wider valid input space, reducing sharply the number of duplicate instances visited, and identify more unique tests and IDIs calibrated for the realistic feature correlations. Compared with state-of-the-art black-box and white-box fairness testing methods, our approach generates on average 4.65%-161.66% more unique tests and identifies 154.60%-634.80% more IDIs, with a performance speed-up of 12.54%-1313.47%. Moreover, the IDIs identified by MAEFT can be well exploited to repair the original models through retraining. These IDIs lead to, on average, a 59.15% boost in model fairness, 15.94%-48.73% higher than those identified by the state-of-the-art fairness testing methods. The models retrained with MAEFT also exhibit 37.66%-46.81% stronger generalization ability than those retrained with the state-of-the-art fairness testing methods.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2213-2231"},"PeriodicalIF":5.6,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144328539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair RepairLLaMA:程序修复的有效表示和微调适配器
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-18 DOI: 10.1109/TSE.2025.3581062
André Silva;Sen Fang;Martin Monperrus
{"title":"RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair","authors":"André Silva;Sen Fang;Martin Monperrus","doi":"10.1109/TSE.2025.3581062","DOIUrl":"10.1109/TSE.2025.3581062","url":null,"abstract":"Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tune LLMs with naive code representations and does not scale to frontier models. To address this problem, we propose RepairLLaMA, a novel program repair approach that 1) identifies optimal code representations for APR with fine-tuned models, and 2) pioneers state-of-the-art parameter-efficient fine-tuning technique (PEFT) for program repair. This results in RepairLLaMA producing a highly effective ‘program repair adapter’ for fixing bugs with AI. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals and produce better patches. Second, parameter-efficient fine-tuning helps fine-tuning to converge and clearly contributes to the effectiveness of RepairLLaMA in fixing bugs outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 144 Defects4J v2, 109 HumanEval-Java, and 20 GitBug-Java bugs, outperforming all baselines.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2366-2380"},"PeriodicalIF":5.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11039501","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144319900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信