{"title":"基于bert的安全漏洞报告准确预测模型","authors":"Xudong Cao, Tianwei Liu, Jiayuan Zhang, Mengyue Feng, Xin Zhang, Wanying Cao, Hongyu Sun, Yuqing Zhang","doi":"10.1109/dsn-w54100.2022.00030","DOIUrl":null,"url":null,"abstract":"Bidirectional Encoder Representation from Transformers (Bert) has achieved impressive performance in several Natural Language Processing (NLP) tasks. However, there has been limited investigation on its adaptation guidelines in specialized fields. Here we focus on the software security domain. Early identification of security-related reports in software bug reports is one of the essential means to prevent security accidents. However, the prediction of security bug reports (SBRs) is limited by the scarcity and imbalance of samples in this field and the complex characteristics of SBRs. So motivated, we constructed the largest dataset in this field and proposed a Security Bug Report Prediction Model Based on Bert (SbrPBert). By introducing a layer-based learning rate attenuation strategy and a fine-tuning method for freezing some layers, our model outperforms the baseline model on both our dataset and other small-sample datasets. This means the practical value of the model in BUG tracking systems or projects that lack samples. Moreover, our model has detected 56 hidden vulnerabilities through deployment on the Mozilla and RedHat projects so far.","PeriodicalId":349937,"journal":{"name":"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SbrPBert: A BERT-Based Model for Accurate Security Bug Report Prediction\",\"authors\":\"Xudong Cao, Tianwei Liu, Jiayuan Zhang, Mengyue Feng, Xin Zhang, Wanying Cao, Hongyu Sun, Yuqing Zhang\",\"doi\":\"10.1109/dsn-w54100.2022.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bidirectional Encoder Representation from Transformers (Bert) has achieved impressive performance in several Natural Language Processing (NLP) tasks. However, there has been limited investigation on its adaptation guidelines in specialized fields. Here we focus on the software security domain. Early identification of security-related reports in software bug reports is one of the essential means to prevent security accidents. However, the prediction of security bug reports (SBRs) is limited by the scarcity and imbalance of samples in this field and the complex characteristics of SBRs. So motivated, we constructed the largest dataset in this field and proposed a Security Bug Report Prediction Model Based on Bert (SbrPBert). By introducing a layer-based learning rate attenuation strategy and a fine-tuning method for freezing some layers, our model outperforms the baseline model on both our dataset and other small-sample datasets. This means the practical value of the model in BUG tracking systems or projects that lack samples. Moreover, our model has detected 56 hidden vulnerabilities through deployment on the Mozilla and RedHat projects so far.\",\"PeriodicalId\":349937,\"journal\":{\"name\":\"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/dsn-w54100.2022.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/dsn-w54100.2022.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SbrPBert: A BERT-Based Model for Accurate Security Bug Report Prediction
Bidirectional Encoder Representation from Transformers (Bert) has achieved impressive performance in several Natural Language Processing (NLP) tasks. However, there has been limited investigation on its adaptation guidelines in specialized fields. Here we focus on the software security domain. Early identification of security-related reports in software bug reports is one of the essential means to prevent security accidents. However, the prediction of security bug reports (SBRs) is limited by the scarcity and imbalance of samples in this field and the complex characteristics of SBRs. So motivated, we constructed the largest dataset in this field and proposed a Security Bug Report Prediction Model Based on Bert (SbrPBert). By introducing a layer-based learning rate attenuation strategy and a fine-tuning method for freezing some layers, our model outperforms the baseline model on both our dataset and other small-sample datasets. This means the practical value of the model in BUG tracking systems or projects that lack samples. Moreover, our model has detected 56 hidden vulnerabilities through deployment on the Mozilla and RedHat projects so far.