Jiamou Sun, Zhenchang Xing, Xiwei Xu, Liming Zhu, Qinghua Lu
{"title":"Heterogeneous Vulnerability Report Traceability Recovery by Vulnerability Aspect Matching","authors":"Jiamou Sun, Zhenchang Xing, Xiwei Xu, Liming Zhu, Qinghua Lu","doi":"10.1109/ICSME55016.2022.00024","DOIUrl":null,"url":null,"abstract":"Security databases describe characteristics of discovered vulnerabilities in text for future studying and patching. However, due to different maintainers having different perspectives about vulnerabilities, they often describe the same vulnerability in different ways, creating obstacles for gathering comprehensive information about the vulnerabilities from different databases. To mitigate this problem, Common Vulnerability and Exposures (CVE) is established to identify each publicly disclosed vulnerability with a unique CVE id, and vulnerability databases by different vendors and organizations can reference the CVE ids in their vulnerability reports. In spite of the wide adoption of CVEs, traceability issues are still prevalent. Our empirical study on vulnerability traceability across four representative security databases (NVD, IBM X-Force, ExploitDB, Openwall) shows that there was a fast-increasing amount of CVE records, traceability delay, and missing issues become severe for the vulnerability databases. To address these issues, we develop an automatic traceability recovery method for recommending related external vulnerability reports to the reports in one database. As vulnerability reports from different databases differ in content details and length, our approach does not match the reports at the document level but extracts seven distinctive vulnerability key aspects that are widely present in vulnerability descriptions. As a proof of concept, we apply our methods to recommend the reports from IBM X-Force, ExploitDB and Openwall to the NVD report. We use NVD as the target because it is a de-facto standard vulnerability database that contains the most comprehensive list of vulnerabilities. Our experiments on a wide range of NLP methods show our aspect-level matching methods can achieve high MRR and accuracy for traceability recovery across heterogeneous vulnerability databases.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Security databases describe characteristics of discovered vulnerabilities in text for future studying and patching. However, due to different maintainers having different perspectives about vulnerabilities, they often describe the same vulnerability in different ways, creating obstacles for gathering comprehensive information about the vulnerabilities from different databases. To mitigate this problem, Common Vulnerability and Exposures (CVE) is established to identify each publicly disclosed vulnerability with a unique CVE id, and vulnerability databases by different vendors and organizations can reference the CVE ids in their vulnerability reports. In spite of the wide adoption of CVEs, traceability issues are still prevalent. Our empirical study on vulnerability traceability across four representative security databases (NVD, IBM X-Force, ExploitDB, Openwall) shows that there was a fast-increasing amount of CVE records, traceability delay, and missing issues become severe for the vulnerability databases. To address these issues, we develop an automatic traceability recovery method for recommending related external vulnerability reports to the reports in one database. As vulnerability reports from different databases differ in content details and length, our approach does not match the reports at the document level but extracts seven distinctive vulnerability key aspects that are widely present in vulnerability descriptions. As a proof of concept, we apply our methods to recommend the reports from IBM X-Force, ExploitDB and Openwall to the NVD report. We use NVD as the target because it is a de-facto standard vulnerability database that contains the most comprehensive list of vulnerabilities. Our experiments on a wide range of NLP methods show our aspect-level matching methods can achieve high MRR and accuracy for traceability recovery across heterogeneous vulnerability databases.