基于辅助数据集能量学习的鲁棒代码模型

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering Pub Date : 2022-10-10 DOI:10.1145/3551349.3561171

Nghi D. Q. Bui, Yijun Yu

{"title":"基于辅助数据集能量学习的鲁棒代码模型","authors":"Nghi D. Q. Bui, Yijun Yu","doi":"10.1145/3551349.3561171","DOIUrl":null,"url":null,"abstract":"Existing approaches to improving the robustness of source code models concentrate on recognizing adversarial samples rather than valid samples that fall outside of a given distribution, which we refer to as out-of-distribution (OOD) samples. To this end, we propose to use an auxiliary dataset (out-of-distribution) such that, when trained together with the main dataset, they will enhance the model’s robustness. We adapt energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models. In terms of OOD detection and adversarial samples detection, our evaluation results demonstrate a greater robustness for existing source code models to become more accurate at recognizing OOD data while being more resistant to adversarial attacks at the same time.","PeriodicalId":197939,"journal":{"name":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Robust Models of Code via Energy-Based Learning on Auxiliary Datasets\",\"authors\":\"Nghi D. Q. Bui, Yijun Yu\",\"doi\":\"10.1145/3551349.3561171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing approaches to improving the robustness of source code models concentrate on recognizing adversarial samples rather than valid samples that fall outside of a given distribution, which we refer to as out-of-distribution (OOD) samples. To this end, we propose to use an auxiliary dataset (out-of-distribution) such that, when trained together with the main dataset, they will enhance the model’s robustness. We adapt energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models. In terms of OOD detection and adversarial samples detection, our evaluation results demonstrate a greater robustness for existing source code models to become more accurate at recognizing OOD data while being more resistant to adversarial attacks at the same time.\",\"PeriodicalId\":197939,\"journal\":{\"name\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3551349.3561171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3551349.3561171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

现有的改进源代码模型健壮性的方法集中于识别对抗性样本，而不是在给定分布之外的有效样本，我们将其称为分布外(OOD)样本。为此，我们建议使用辅助数据集(out- distribution)，这样，当与主数据集一起训练时，它们将增强模型的鲁棒性。我们采用能量有界学习目标函数，对分布内样本分配较高的分数，对分布外样本分配较低的分数，以便将分布外样本纳入源代码模型的训练过程中。在OOD检测和对抗性样本检测方面，我们的评估结果表明，现有源代码模型具有更强的鲁棒性，可以更准确地识别OOD数据，同时更能抵抗对抗性攻击。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Robust Models of Code via Energy-Based Learning on Auxiliary Datasets

Existing approaches to improving the robustness of source code models concentrate on recognizing adversarial samples rather than valid samples that fall outside of a given distribution, which we refer to as out-of-distribution (OOD) samples. To this end, we propose to use an auxiliary dataset (out-of-distribution) such that, when trained together with the main dataset, they will enhance the model’s robustness. We adapt energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models. In terms of OOD detection and adversarial samples detection, our evaluation results demonstrate a greater robustness for existing source code models to become more accurate at recognizing OOD data while being more resistant to adversarial attacks at the same time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

自引率

0.00%

发文量