{"title":"循环:跨编程语言的持续修复","authors":"Wei Yuan, Quanjun Zhang, Tieke He, Chunrong Fang, Nguyen Quoc Viet Hung, X. Hao, Hongzhi Yin","doi":"10.1145/3533767.3534219","DOIUrl":null,"url":null,"abstract":"Automatic Program Repair (APR) aims at fixing buggy source code with less manual debugging efforts, which plays a vital role in improving software reliability and development productivity. Recent APR works have achieved remarkable progress via applying deep learning (DL), particularly neural machine translation (NMT) techniques. However, we observe that existing DL-based APR models suffer from at least two severe drawbacks: (1) Most of them can only generate patches for a single programming language, as a result, to repair multiple languages, we have to build and train many repairing models. (2) Most of them are developed offline. Therefore, they won’t function when there are new-coming requirements. To address the above problems, a T5-based APR framework equipped with continual learning ability across multiple programming languages is proposed, namely ContInual Repair aCross Programming LanguagEs (CIRCLE). Specifically, (1) CIRCLE utilizes a prompting function to narrow the gap between natural language processing (NLP) pre-trained tasks and APR. (2) CIRCLE adopts a difficulty-based rehearsal strategy to achieve lifelong learning for APR without access to the full historical data. (3) An elastic regularization method is employed to strengthen CIRCLE’s continual learning ability further, preventing it from catastrophic forgetting. (4) CIRCLE applies a simple but effective re-repairing method to revise generated errors caused by crossing multiple programming languages. We train CIRCLE for four languages (i.e., C, JAVA, JavaScript, and Python) and evaluate it on five commonly used benchmarks. The experimental results demonstrate that CIRCLE not only effectively and efficiently repairs multiple programming languages in continual learning settings, but also achieves state-of-the-art performance (e.g., fixes 64 Defects4J bugs) with a single repair model.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"CIRCLE: continual repair across programming languages\",\"authors\":\"Wei Yuan, Quanjun Zhang, Tieke He, Chunrong Fang, Nguyen Quoc Viet Hung, X. Hao, Hongzhi Yin\",\"doi\":\"10.1145/3533767.3534219\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Program Repair (APR) aims at fixing buggy source code with less manual debugging efforts, which plays a vital role in improving software reliability and development productivity. Recent APR works have achieved remarkable progress via applying deep learning (DL), particularly neural machine translation (NMT) techniques. However, we observe that existing DL-based APR models suffer from at least two severe drawbacks: (1) Most of them can only generate patches for a single programming language, as a result, to repair multiple languages, we have to build and train many repairing models. (2) Most of them are developed offline. Therefore, they won’t function when there are new-coming requirements. To address the above problems, a T5-based APR framework equipped with continual learning ability across multiple programming languages is proposed, namely ContInual Repair aCross Programming LanguagEs (CIRCLE). Specifically, (1) CIRCLE utilizes a prompting function to narrow the gap between natural language processing (NLP) pre-trained tasks and APR. (2) CIRCLE adopts a difficulty-based rehearsal strategy to achieve lifelong learning for APR without access to the full historical data. (3) An elastic regularization method is employed to strengthen CIRCLE’s continual learning ability further, preventing it from catastrophic forgetting. (4) CIRCLE applies a simple but effective re-repairing method to revise generated errors caused by crossing multiple programming languages. We train CIRCLE for four languages (i.e., C, JAVA, JavaScript, and Python) and evaluate it on five commonly used benchmarks. The experimental results demonstrate that CIRCLE not only effectively and efficiently repairs multiple programming languages in continual learning settings, but also achieves state-of-the-art performance (e.g., fixes 64 Defects4J bugs) with a single repair model.\",\"PeriodicalId\":412271,\"journal\":{\"name\":\"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3533767.3534219\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533767.3534219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
摘要
自动程序修复(APR)旨在通过较少的人工调试工作来修复有缺陷的源代码,这在提高软件可靠性和开发效率方面起着至关重要的作用。最近的APR工作通过应用深度学习(DL),特别是神经机器翻译(NMT)技术取得了显着进展。然而,我们观察到现有的基于dl的APR模型至少存在两个严重的缺陷:(1)大多数模型只能为单一编程语言生成补丁,因此,为了修复多种语言,我们必须构建和训练许多修复模型。(2)大部分是线下开发的。因此,当出现新的需求时,它们将无法发挥作用。针对上述问题,本文提出了一种基于t5的具有跨多种编程语言持续学习能力的APR框架,即跨编程语言持续修复(continuous Repair across programming languages, CIRCLE)。具体而言,(1)CIRCLE利用提示功能缩小了自然语言处理(NLP)预训练任务与APR之间的差距。(2)CIRCLE采用基于难度的预演策略,实现了APR的终身学习,无需获取完整的历史数据。(3)采用弹性正则化方法进一步增强CIRCLE的持续学习能力,防止其发生灾难性遗忘。(4) CIRCLE采用一种简单而有效的重修复方法,对跨多种编程语言产生的错误进行修正。我们针对四种语言(即C、JAVA、JavaScript和Python)训练CIRCLE,并在五种常用基准上对其进行评估。实验结果表明,CIRCLE不仅在持续学习设置中有效且高效地修复多种编程语言,而且还使用单个修复模型实现了最先进的性能(例如,修复64个缺陷4j错误)。
CIRCLE: continual repair across programming languages
Automatic Program Repair (APR) aims at fixing buggy source code with less manual debugging efforts, which plays a vital role in improving software reliability and development productivity. Recent APR works have achieved remarkable progress via applying deep learning (DL), particularly neural machine translation (NMT) techniques. However, we observe that existing DL-based APR models suffer from at least two severe drawbacks: (1) Most of them can only generate patches for a single programming language, as a result, to repair multiple languages, we have to build and train many repairing models. (2) Most of them are developed offline. Therefore, they won’t function when there are new-coming requirements. To address the above problems, a T5-based APR framework equipped with continual learning ability across multiple programming languages is proposed, namely ContInual Repair aCross Programming LanguagEs (CIRCLE). Specifically, (1) CIRCLE utilizes a prompting function to narrow the gap between natural language processing (NLP) pre-trained tasks and APR. (2) CIRCLE adopts a difficulty-based rehearsal strategy to achieve lifelong learning for APR without access to the full historical data. (3) An elastic regularization method is employed to strengthen CIRCLE’s continual learning ability further, preventing it from catastrophic forgetting. (4) CIRCLE applies a simple but effective re-repairing method to revise generated errors caused by crossing multiple programming languages. We train CIRCLE for four languages (i.e., C, JAVA, JavaScript, and Python) and evaluate it on five commonly used benchmarks. The experimental results demonstrate that CIRCLE not only effectively and efficiently repairs multiple programming languages in continual learning settings, but also achieves state-of-the-art performance (e.g., fixes 64 Defects4J bugs) with a single repair model.