Towards Robust Models of Code via Energy-Based Learning on Auxiliary Datasets

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering Pub Date : 2022-10-10 DOI:10.1145/3551349.3561171

Nghi D. Q. Bui, Yijun Yu

引用次数: 0

Abstract

Existing approaches to improving the robustness of source code models concentrate on recognizing adversarial samples rather than valid samples that fall outside of a given distribution, which we refer to as out-of-distribution (OOD) samples. To this end, we propose to use an auxiliary dataset (out-of-distribution) such that, when trained together with the main dataset, they will enhance the model’s robustness. We adapt energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models. In terms of OOD detection and adversarial samples detection, our evaluation results demonstrate a greater robustness for existing source code models to become more accurate at recognizing OOD data while being more resistant to adversarial attacks at the same time.

查看原文本刊更多论文

基于辅助数据集能量学习的鲁棒代码模型

现有的改进源代码模型健壮性的方法集中于识别对抗性样本，而不是在给定分布之外的有效样本，我们将其称为分布外(OOD)样本。为此，我们建议使用辅助数据集(out- distribution)，这样，当与主数据集一起训练时，它们将增强模型的鲁棒性。我们采用能量有界学习目标函数，对分布内样本分配较高的分数，对分布外样本分配较低的分数，以便将分布外样本纳入源代码模型的训练过程中。在OOD检测和对抗性样本检测方面，我们的评估结果表明，现有源代码模型具有更强的鲁棒性，可以更准确地识别OOD数据，同时更能抵抗对抗性攻击。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

自引率

0.00%

发文量