CURE: Code-Aware Neural Machine Translation for Automatic Program Repair

Nan Jiang, Thibaud Lutellier, Lin Tan
{"title":"CURE: Code-Aware Neural Machine Translation for Automatic Program Repair","authors":"Nan Jiang, Thibaud Lutellier, Lin Tan","doi":"10.1109/ICSE43902.2021.00107","DOIUrl":null,"url":null,"abstract":"Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to automatically fix software bugs. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on searching for compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"168","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 168

Abstract

Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to automatically fix software bugs. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on searching for compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks.
用于自动程序修复的代码感知神经机器翻译
自动程序修复(APR)是提高软件可靠性的关键。最近,神经机器翻译(NMT)技术被用于自动修复软件错误。虽然这些方法很有希望,但它们有两个主要的局限性。它们的搜索空间通常不包含正确的修复,而且它们的搜索策略忽略了诸如严格的代码语法之类的软件知识。由于这些限制,现有的基于nmt的技术不如基于模板的最佳方法。我们提出了一种新的基于nmt的APR技术,它有三个主要的新颖之处。首先,CURE在大型软件代码库上预训练编程语言(PL)模型,以便在APR任务之前学习类似于开发人员的源代码。其次,CURE设计了一种新的代码感知搜索策略,通过专注于搜索可编译补丁和长度接近错误代码的补丁来找到更多正确的修复。最后,CURE使用子词标记技术生成包含更多正确修复的更小的搜索空间。我们对两个广泛使用的基准测试的评估表明,CURE正确地修复了57个缺陷4j错误和26个QuixBugs错误,在这两个基准测试中都优于所有现有的APR技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信