SbrPBert: A BERT-Based Model for Accurate Security Bug Report Prediction

Xudong Cao, Tianwei Liu, Jiayuan Zhang, Mengyue Feng, Xin Zhang, Wanying Cao, Hongyu Sun, Yuqing Zhang
{"title":"SbrPBert: A BERT-Based Model for Accurate Security Bug Report Prediction","authors":"Xudong Cao, Tianwei Liu, Jiayuan Zhang, Mengyue Feng, Xin Zhang, Wanying Cao, Hongyu Sun, Yuqing Zhang","doi":"10.1109/dsn-w54100.2022.00030","DOIUrl":null,"url":null,"abstract":"Bidirectional Encoder Representation from Transformers (Bert) has achieved impressive performance in several Natural Language Processing (NLP) tasks. However, there has been limited investigation on its adaptation guidelines in specialized fields. Here we focus on the software security domain. Early identification of security-related reports in software bug reports is one of the essential means to prevent security accidents. However, the prediction of security bug reports (SBRs) is limited by the scarcity and imbalance of samples in this field and the complex characteristics of SBRs. So motivated, we constructed the largest dataset in this field and proposed a Security Bug Report Prediction Model Based on Bert (SbrPBert). By introducing a layer-based learning rate attenuation strategy and a fine-tuning method for freezing some layers, our model outperforms the baseline model on both our dataset and other small-sample datasets. This means the practical value of the model in BUG tracking systems or projects that lack samples. Moreover, our model has detected 56 hidden vulnerabilities through deployment on the Mozilla and RedHat projects so far.","PeriodicalId":349937,"journal":{"name":"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/dsn-w54100.2022.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Bidirectional Encoder Representation from Transformers (Bert) has achieved impressive performance in several Natural Language Processing (NLP) tasks. However, there has been limited investigation on its adaptation guidelines in specialized fields. Here we focus on the software security domain. Early identification of security-related reports in software bug reports is one of the essential means to prevent security accidents. However, the prediction of security bug reports (SBRs) is limited by the scarcity and imbalance of samples in this field and the complex characteristics of SBRs. So motivated, we constructed the largest dataset in this field and proposed a Security Bug Report Prediction Model Based on Bert (SbrPBert). By introducing a layer-based learning rate attenuation strategy and a fine-tuning method for freezing some layers, our model outperforms the baseline model on both our dataset and other small-sample datasets. This means the practical value of the model in BUG tracking systems or projects that lack samples. Moreover, our model has detected 56 hidden vulnerabilities through deployment on the Mozilla and RedHat projects so far.
基于bert的安全漏洞报告准确预测模型
变形金刚(Bert)的双向编码器表示在一些自然语言处理(NLP)任务中取得了令人印象深刻的表现。然而,对其在专门领域的适应指南的研究却十分有限。这里我们关注软件安全领域。在软件bug报告中及早发现与安全相关的报告是防止安全事故发生的重要手段之一。然而,安全漏洞报告(sbr)的预测受到该领域样本的稀缺性和不平衡性以及sbr的复杂性的限制。因此,我们构建了该领域最大的数据集,并提出了基于Bert的安全漏洞报告预测模型(SbrPBert)。通过引入基于层的学习率衰减策略和冻结某些层的微调方法,我们的模型在我们的数据集和其他小样本数据集上都优于基线模型。这意味着模型在缺乏样本的BUG跟踪系统或项目中的实际价值。此外,到目前为止,我们的模型已经通过部署在Mozilla和RedHat项目上检测到56个隐藏的漏洞。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信