{"title":"Towards Vulnerability Types Classification Using Pure Self-Attention: A Common Weakness Enumeration Based Approach","authors":"Tianyi Wang, Shengzhi Qin, Kam-pui Chow","doi":"10.1109/CSE53436.2021.00030","DOIUrl":null,"url":null,"abstract":"The wake of increasing malicious cyberattack cases has aroused people’s attention on cybersecurity and vulnerabilities. Common Vulnerabilities and Exposures (CVE), a famous cybersecurity vulnerability database, is often referenced as a standard in cybersecurity territory for both research and commercial purposes. In the past decade, the development of Common Weakness Enumeration (CWE) has provided useful vulnerability taxonomy on CVE entities. However, the generation process of CWE categories is totally by manual working, which has made cybersecurity professionals suffer from the unpredictable timing waiting for the up to date information to be published. In this study, a new CWE based vulnerability types classification method is introduced with the adoption of the CVE dataset. Our method adopts transformer encoder-decoder architecture and uses pure self-attention mechanism without any convolutions and recurrences. We first encode the CVE input entries to learn representative features and then decode them to perform vulnerability types classification regarding the CWE standards. Fine-tuned deep pre-trained Bidirectional Encoder Representation from Transformers (BERT) is utilized in experiment and performs automatic vulnerability types classification tasks on unlabeled CVE candidates and assigns CWE IDs. The proposed vulnerability types classification method outperforms all classical Natural Language Processing (NLP) baseline algorithms, conducting a high accuracy of 90.74% on the testing dataset. In addition, the well-trained vulnerability types classification model is believed to achieve considerable correctness at industry level when applied to the real-life cyber threat intelligence related articles and reports.","PeriodicalId":6838,"journal":{"name":"2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)","volume":"8 1","pages":"146-153"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSE53436.2021.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The wake of increasing malicious cyberattack cases has aroused people’s attention on cybersecurity and vulnerabilities. Common Vulnerabilities and Exposures (CVE), a famous cybersecurity vulnerability database, is often referenced as a standard in cybersecurity territory for both research and commercial purposes. In the past decade, the development of Common Weakness Enumeration (CWE) has provided useful vulnerability taxonomy on CVE entities. However, the generation process of CWE categories is totally by manual working, which has made cybersecurity professionals suffer from the unpredictable timing waiting for the up to date information to be published. In this study, a new CWE based vulnerability types classification method is introduced with the adoption of the CVE dataset. Our method adopts transformer encoder-decoder architecture and uses pure self-attention mechanism without any convolutions and recurrences. We first encode the CVE input entries to learn representative features and then decode them to perform vulnerability types classification regarding the CWE standards. Fine-tuned deep pre-trained Bidirectional Encoder Representation from Transformers (BERT) is utilized in experiment and performs automatic vulnerability types classification tasks on unlabeled CVE candidates and assigns CWE IDs. The proposed vulnerability types classification method outperforms all classical Natural Language Processing (NLP) baseline algorithms, conducting a high accuracy of 90.74% on the testing dataset. In addition, the well-trained vulnerability types classification model is believed to achieve considerable correctness at industry level when applied to the real-life cyber threat intelligence related articles and reports.