PIONEER:在压缩预训练的代码模型时,提高学生模型的鲁棒性

IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Xiangyue Liu, Xinwei Liu, Lili Bo, Xiaoxue Wu, Yun Yang, Xiaobing Sun, Feng Zhou
{"title":"PIONEER:在压缩预训练的代码模型时,提高学生模型的鲁棒性","authors":"Xiangyue Liu,&nbsp;Xinwei Liu,&nbsp;Lili Bo,&nbsp;Xiaoxue Wu,&nbsp;Yun Yang,&nbsp;Xiaobing Sun,&nbsp;Feng Zhou","doi":"10.1007/s10515-025-00560-2","DOIUrl":null,"url":null,"abstract":"<div><p>Pre-trained models of code have shown significant effectiveness in a variety of software engineering tasks, but they are difficult for local deployment due to their large size. Existing works mainly focus on compressing these large models into small models to achieve similar performance and efficient inference. However, it is ignored that the small models should be robust enough to deal with adversarial examples that make incorrect predictions to users. Knowledge distillation techniques typically transform the model compression problem into a combinatorial optimization problem of the student architecture space to achieve the best student model performance. But they can only improve the robustness of the student model to a limited extent through traditional adversarial training. This paper proposes PIONEER (Im<b>P</b>rov<b>I</b>ng the R<b>O</b>bustness of Stude<b>N</b>t Mod<b>E</b>ls Wh<b>E</b>n Comp<b>R</b>essing Code Models), a novel knowledge distillation technique that enhances the robustness of the student model without requiring adversarial training. PIONEER incorporates robustness evaluation during distillation to guide the optimization of the student model architecture. By using the probability distributions of original examples and adversarial examples as soft labels, the student model learns the features of both the original samples and adversarial examples during training. We conduct experimental evaluations on two downstream tasks (vulnerability prediction and clone detection) for the three models (CodeBERT, GraphCodeBERT, and CodeT5). We utilize PIONEER to compress six downstream task models to small (3 MB) models that are 206<span>\\(\\times\\)</span> smaller than the original size. The results show that compressed models reduce the inference latency (76<span>\\(\\times\\)</span>) and improve the robustness of the model (87.54%) with negligible loss of effectiveness (1.67%).</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PIONEER: improving the robustness of student models when compressing pre-trained models of code\",\"authors\":\"Xiangyue Liu,&nbsp;Xinwei Liu,&nbsp;Lili Bo,&nbsp;Xiaoxue Wu,&nbsp;Yun Yang,&nbsp;Xiaobing Sun,&nbsp;Feng Zhou\",\"doi\":\"10.1007/s10515-025-00560-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Pre-trained models of code have shown significant effectiveness in a variety of software engineering tasks, but they are difficult for local deployment due to their large size. Existing works mainly focus on compressing these large models into small models to achieve similar performance and efficient inference. However, it is ignored that the small models should be robust enough to deal with adversarial examples that make incorrect predictions to users. Knowledge distillation techniques typically transform the model compression problem into a combinatorial optimization problem of the student architecture space to achieve the best student model performance. But they can only improve the robustness of the student model to a limited extent through traditional adversarial training. This paper proposes PIONEER (Im<b>P</b>rov<b>I</b>ng the R<b>O</b>bustness of Stude<b>N</b>t Mod<b>E</b>ls Wh<b>E</b>n Comp<b>R</b>essing Code Models), a novel knowledge distillation technique that enhances the robustness of the student model without requiring adversarial training. PIONEER incorporates robustness evaluation during distillation to guide the optimization of the student model architecture. By using the probability distributions of original examples and adversarial examples as soft labels, the student model learns the features of both the original samples and adversarial examples during training. We conduct experimental evaluations on two downstream tasks (vulnerability prediction and clone detection) for the three models (CodeBERT, GraphCodeBERT, and CodeT5). We utilize PIONEER to compress six downstream task models to small (3 MB) models that are 206<span>\\\\(\\\\times\\\\)</span> smaller than the original size. The results show that compressed models reduce the inference latency (76<span>\\\\(\\\\times\\\\)</span>) and improve the robustness of the model (87.54%) with negligible loss of effectiveness (1.67%).</p></div>\",\"PeriodicalId\":55414,\"journal\":{\"name\":\"Automated Software Engineering\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automated Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10515-025-00560-2\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00560-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

预训练的代码模型在各种软件工程任务中显示出显著的有效性,但是由于它们的规模太大,很难在本地部署。现有的工作主要集中在将这些大模型压缩成小模型,以达到相似的性能和高效的推理。然而,它忽略了小模型应该足够健壮,以处理对用户做出错误预测的对抗性示例。知识蒸馏技术通常将模型压缩问题转化为学生体系结构空间的组合优化问题,以获得最佳的学生模型性能。但通过传统的对抗性训练,只能在有限程度上提高学生模型的鲁棒性。本文提出了一种新的知识蒸馏技术PIONEER (improved the鲁棒性of StudeNt ModEls WhEn compressed Code ModEls),它可以在不需要对抗性训练的情况下增强学生模型的鲁棒性。先锋在蒸馏过程中纳入鲁棒性评估,以指导学生模型架构的优化。通过使用原始样本和对抗样本的概率分布作为软标签,学生模型在训练过程中学习原始样本和对抗样本的特征。我们对三个模型(CodeBERT、GraphCodeBERT和CodeT5)的两个下游任务(漏洞预测和克隆检测)进行了实验评估。我们利用PIONEER将6个下游任务模型压缩为比原始大小小206 \(\times\)的小(3 MB)模型。结果表明,压缩模型减少了推理延迟(76 \(\times\)),提高了模型的鲁棒性(87.54)%) with negligible loss of effectiveness (1.67%).
本文章由计算机程序翻译,如有差异,请以英文原文为准。

PIONEER: improving the robustness of student models when compressing pre-trained models of code

PIONEER: improving the robustness of student models when compressing pre-trained models of code

Pre-trained models of code have shown significant effectiveness in a variety of software engineering tasks, but they are difficult for local deployment due to their large size. Existing works mainly focus on compressing these large models into small models to achieve similar performance and efficient inference. However, it is ignored that the small models should be robust enough to deal with adversarial examples that make incorrect predictions to users. Knowledge distillation techniques typically transform the model compression problem into a combinatorial optimization problem of the student architecture space to achieve the best student model performance. But they can only improve the robustness of the student model to a limited extent through traditional adversarial training. This paper proposes PIONEER (ImProvIng the RObustness of StudeNt ModEls WhEn CompRessing Code Models), a novel knowledge distillation technique that enhances the robustness of the student model without requiring adversarial training. PIONEER incorporates robustness evaluation during distillation to guide the optimization of the student model architecture. By using the probability distributions of original examples and adversarial examples as soft labels, the student model learns the features of both the original samples and adversarial examples during training. We conduct experimental evaluations on two downstream tasks (vulnerability prediction and clone detection) for the three models (CodeBERT, GraphCodeBERT, and CodeT5). We utilize PIONEER to compress six downstream task models to small (3 MB) models that are 206\(\times\) smaller than the original size. The results show that compressed models reduce the inference latency (76\(\times\)) and improve the robustness of the model (87.54%) with negligible loss of effectiveness (1.67%).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Automated Software Engineering
Automated Software Engineering 工程技术-计算机:软件工程
CiteScore
4.80
自引率
11.80%
发文量
51
审稿时长
>12 weeks
期刊介绍: This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信