Joseph Renzullo, Pemma Reiter, Westley Weimer, Stephanie Forrest
{"title":"Automated Program Repair: Emerging Trends Pose and Expose Problems for Benchmarks","authors":"Joseph Renzullo, Pemma Reiter, Westley Weimer, Stephanie Forrest","doi":"10.1145/3704997","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) pervades the field of Automated Program Repair (APR). Algorithms deploy neural machine translation and large language models (LLMs) to generate software patches, among other tasks. But, there are important differences between these applications of ML and earlier work, which complicates the task of ensuring that results are valid and likely to generalize. A challenge is that the most popular APR evaluation benchmarks were not designed with ML techniques in mind. This is especially true for LLMs, whose large and often poorly-disclosed training datasets may include problems on which they are evaluated. This paper reviews work in APR published in the field’s top five venues since 2018, emphasizing emerging trends in the field, including the dramatic rise of ML models, including LLMs. ML-based papers are categorized along structural and functional dimensions, and a variety of issues are identified that these new methods raise. Importantly, data leakage and contamination concerns arise from the challenge of validating ML-based APR using existing benchmarks, which were designed before these techniques were popular. We discuss inconsistencies in evaluation design and performance reporting and offer pointers to solutions where they are available. Finally, we highlight promising new directions that the field is already taking.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"87 1","pages":""},"PeriodicalIF":23.8000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3704997","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) pervades the field of Automated Program Repair (APR). Algorithms deploy neural machine translation and large language models (LLMs) to generate software patches, among other tasks. But, there are important differences between these applications of ML and earlier work, which complicates the task of ensuring that results are valid and likely to generalize. A challenge is that the most popular APR evaluation benchmarks were not designed with ML techniques in mind. This is especially true for LLMs, whose large and often poorly-disclosed training datasets may include problems on which they are evaluated. This paper reviews work in APR published in the field’s top five venues since 2018, emphasizing emerging trends in the field, including the dramatic rise of ML models, including LLMs. ML-based papers are categorized along structural and functional dimensions, and a variety of issues are identified that these new methods raise. Importantly, data leakage and contamination concerns arise from the challenge of validating ML-based APR using existing benchmarks, which were designed before these techniques were popular. We discuss inconsistencies in evaluation design and performance reporting and offer pointers to solutions where they are available. Finally, we highlight promising new directions that the field is already taking.
期刊介绍:
ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods.
ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.