Adversarial Robustness of Deep Code Comment Generation

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2021-07-31 DOI:10.1145/3501256

Yu Zhou, Xiaoqing Zhang, Juanjuan Shen, Tingting Han, Taolue Chen, H. Gall

{"title":"Adversarial Robustness of Deep Code Comment Generation","authors":"Yu Zhou, Xiaoqing Zhang, Juanjuan Shen, Tingting Han, Taolue Chen, H. Gall","doi":"10.1145/3501256","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have shown remarkable performance in a variety of domains such as computer vision, speech recognition, and natural language processing. Recently they also have been applied to various software engineering tasks, typically involving processing source code. DNNs are well-known to be vulnerable to adversarial examples, i.e., fabricated inputs that could lead to various misbehaviors of the DNN model while being perceived as benign by humans. In this paper, we focus on the code comment generation task in software engineering and study the robustness issue of the DNNs when they are applied to this task. We propose ACCENT(Adversarial Code Comment gENeraTor), an identifier substitution approach to craft adversarial code snippets, which are syntactically correct and semantically close to the original code snippet, but may mislead the DNNs to produce completely irrelevant code comments. In order to improve the robustness, ACCENT also incorporates a novel training method, which can be applied to existing code comment generation models. We conduct comprehensive experiments to evaluate our approach by attacking the mainstream encoder-decoder architectures on two large-scale publicly available datasets. The results show that ACCENT efficiently produces stable attacks with functionality-preserving adversarial examples, and the generated examples have better transferability compared with the baselines. We also confirm, via experiments, the effectiveness in improving model robustness with our training method.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"119 1","pages":"1 - 30"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3501256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Deep neural networks (DNNs) have shown remarkable performance in a variety of domains such as computer vision, speech recognition, and natural language processing. Recently they also have been applied to various software engineering tasks, typically involving processing source code. DNNs are well-known to be vulnerable to adversarial examples, i.e., fabricated inputs that could lead to various misbehaviors of the DNN model while being perceived as benign by humans. In this paper, we focus on the code comment generation task in software engineering and study the robustness issue of the DNNs when they are applied to this task. We propose ACCENT(Adversarial Code Comment gENeraTor), an identifier substitution approach to craft adversarial code snippets, which are syntactically correct and semantically close to the original code snippet, but may mislead the DNNs to produce completely irrelevant code comments. In order to improve the robustness, ACCENT also incorporates a novel training method, which can be applied to existing code comment generation models. We conduct comprehensive experiments to evaluate our approach by attacking the mainstream encoder-decoder architectures on two large-scale publicly available datasets. The results show that ACCENT efficiently produces stable attacks with functionality-preserving adversarial examples, and the generated examples have better transferability compared with the baselines. We also confirm, via experiments, the effectiveness in improving model robustness with our training method.

查看原文本刊更多论文

深度代码注释生成的对抗鲁棒性

深度神经网络(dnn)在计算机视觉、语音识别和自然语言处理等领域表现出了显著的性能。最近，它们也被应用到各种软件工程任务中，通常涉及到处理源代码。众所周知，DNN容易受到对抗性示例的影响，即捏造的输入可能导致DNN模型的各种不当行为，而被人类认为是良性的。本文以软件工程中的代码注释生成任务为研究对象，研究了应用于该任务的深度神经网络的鲁棒性问题。我们提出了ACCENT(对抗性代码注释生成器)，这是一种标识符替代方法，用于制作对抗性代码片段，这些代码片段在语法上是正确的，在语义上接近原始代码片段，但可能会误导dnn产生完全无关的代码注释。为了提高鲁棒性，ACCENT还引入了一种新的训练方法，该方法可以应用于现有的代码注释生成模型。我们进行了全面的实验，通过在两个大规模公开可用的数据集上攻击主流编码器-解码器架构来评估我们的方法。结果表明，该方法能够有效地生成具有保留功能的对抗样例，并且生成的样例与基线相比具有更好的可转移性。通过实验验证了该训练方法在提高模型鲁棒性方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量