基于bert的安全漏洞报告准确预测模型

2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) Pub Date : 2022-06-01 DOI:10.1109/dsn-w54100.2022.00030

Xudong Cao, Tianwei Liu, Jiayuan Zhang, Mengyue Feng, Xin Zhang, Wanying Cao, Hongyu Sun, Yuqing Zhang

{"title":"基于bert的安全漏洞报告准确预测模型","authors":"Xudong Cao, Tianwei Liu, Jiayuan Zhang, Mengyue Feng, Xin Zhang, Wanying Cao, Hongyu Sun, Yuqing Zhang","doi":"10.1109/dsn-w54100.2022.00030","DOIUrl":null,"url":null,"abstract":"Bidirectional Encoder Representation from Transformers (Bert) has achieved impressive performance in several Natural Language Processing (NLP) tasks. However, there has been limited investigation on its adaptation guidelines in specialized fields. Here we focus on the software security domain. Early identification of security-related reports in software bug reports is one of the essential means to prevent security accidents. However, the prediction of security bug reports (SBRs) is limited by the scarcity and imbalance of samples in this field and the complex characteristics of SBRs. So motivated, we constructed the largest dataset in this field and proposed a Security Bug Report Prediction Model Based on Bert (SbrPBert). By introducing a layer-based learning rate attenuation strategy and a fine-tuning method for freezing some layers, our model outperforms the baseline model on both our dataset and other small-sample datasets. This means the practical value of the model in BUG tracking systems or projects that lack samples. Moreover, our model has detected 56 hidden vulnerabilities through deployment on the Mozilla and RedHat projects so far.","PeriodicalId":349937,"journal":{"name":"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SbrPBert: A BERT-Based Model for Accurate Security Bug Report Prediction\",\"authors\":\"Xudong Cao, Tianwei Liu, Jiayuan Zhang, Mengyue Feng, Xin Zhang, Wanying Cao, Hongyu Sun, Yuqing Zhang\",\"doi\":\"10.1109/dsn-w54100.2022.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bidirectional Encoder Representation from Transformers (Bert) has achieved impressive performance in several Natural Language Processing (NLP) tasks. However, there has been limited investigation on its adaptation guidelines in specialized fields. Here we focus on the software security domain. Early identification of security-related reports in software bug reports is one of the essential means to prevent security accidents. However, the prediction of security bug reports (SBRs) is limited by the scarcity and imbalance of samples in this field and the complex characteristics of SBRs. So motivated, we constructed the largest dataset in this field and proposed a Security Bug Report Prediction Model Based on Bert (SbrPBert). By introducing a layer-based learning rate attenuation strategy and a fine-tuning method for freezing some layers, our model outperforms the baseline model on both our dataset and other small-sample datasets. This means the practical value of the model in BUG tracking systems or projects that lack samples. Moreover, our model has detected 56 hidden vulnerabilities through deployment on the Mozilla and RedHat projects so far.\",\"PeriodicalId\":349937,\"journal\":{\"name\":\"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/dsn-w54100.2022.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/dsn-w54100.2022.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

变形金刚(Bert)的双向编码器表示在一些自然语言处理(NLP)任务中取得了令人印象深刻的表现。然而，对其在专门领域的适应指南的研究却十分有限。这里我们关注软件安全领域。在软件bug报告中及早发现与安全相关的报告是防止安全事故发生的重要手段之一。然而，安全漏洞报告(sbr)的预测受到该领域样本的稀缺性和不平衡性以及sbr的复杂性的限制。因此，我们构建了该领域最大的数据集，并提出了基于Bert的安全漏洞报告预测模型(SbrPBert)。通过引入基于层的学习率衰减策略和冻结某些层的微调方法，我们的模型在我们的数据集和其他小样本数据集上都优于基线模型。这意味着模型在缺乏样本的BUG跟踪系统或项目中的实际价值。此外，到目前为止，我们的模型已经通过部署在Mozilla和RedHat项目上检测到56个隐藏的漏洞。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SbrPBert: A BERT-Based Model for Accurate Security Bug Report Prediction

Bidirectional Encoder Representation from Transformers (Bert) has achieved impressive performance in several Natural Language Processing (NLP) tasks. However, there has been limited investigation on its adaptation guidelines in specialized fields. Here we focus on the software security domain. Early identification of security-related reports in software bug reports is one of the essential means to prevent security accidents. However, the prediction of security bug reports (SBRs) is limited by the scarcity and imbalance of samples in this field and the complex characteristics of SBRs. So motivated, we constructed the largest dataset in this field and proposed a Security Bug Report Prediction Model Based on Bert (SbrPBert). By introducing a layer-based learning rate attenuation strategy and a fine-tuning method for freezing some layers, our model outperforms the baseline model on both our dataset and other small-sample datasets. This means the practical value of the model in BUG tracking systems or projects that lack samples. Moreover, our model has detected 56 hidden vulnerabilities through deployment on the Mozilla and RedHat projects so far.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)

自引率

0.00%

发文量