Jiamou Sun, Zhenchang Xing, Xiwei Xu, Liming Zhu, Qinghua Lu
{"title":"基于漏洞方面匹配的异构漏洞报告可跟踪性恢复","authors":"Jiamou Sun, Zhenchang Xing, Xiwei Xu, Liming Zhu, Qinghua Lu","doi":"10.1109/ICSME55016.2022.00024","DOIUrl":null,"url":null,"abstract":"Security databases describe characteristics of discovered vulnerabilities in text for future studying and patching. However, due to different maintainers having different perspectives about vulnerabilities, they often describe the same vulnerability in different ways, creating obstacles for gathering comprehensive information about the vulnerabilities from different databases. To mitigate this problem, Common Vulnerability and Exposures (CVE) is established to identify each publicly disclosed vulnerability with a unique CVE id, and vulnerability databases by different vendors and organizations can reference the CVE ids in their vulnerability reports. In spite of the wide adoption of CVEs, traceability issues are still prevalent. Our empirical study on vulnerability traceability across four representative security databases (NVD, IBM X-Force, ExploitDB, Openwall) shows that there was a fast-increasing amount of CVE records, traceability delay, and missing issues become severe for the vulnerability databases. To address these issues, we develop an automatic traceability recovery method for recommending related external vulnerability reports to the reports in one database. As vulnerability reports from different databases differ in content details and length, our approach does not match the reports at the document level but extracts seven distinctive vulnerability key aspects that are widely present in vulnerability descriptions. As a proof of concept, we apply our methods to recommend the reports from IBM X-Force, ExploitDB and Openwall to the NVD report. We use NVD as the target because it is a de-facto standard vulnerability database that contains the most comprehensive list of vulnerabilities. Our experiments on a wide range of NLP methods show our aspect-level matching methods can achieve high MRR and accuracy for traceability recovery across heterogeneous vulnerability databases.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Heterogeneous Vulnerability Report Traceability Recovery by Vulnerability Aspect Matching\",\"authors\":\"Jiamou Sun, Zhenchang Xing, Xiwei Xu, Liming Zhu, Qinghua Lu\",\"doi\":\"10.1109/ICSME55016.2022.00024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Security databases describe characteristics of discovered vulnerabilities in text for future studying and patching. However, due to different maintainers having different perspectives about vulnerabilities, they often describe the same vulnerability in different ways, creating obstacles for gathering comprehensive information about the vulnerabilities from different databases. To mitigate this problem, Common Vulnerability and Exposures (CVE) is established to identify each publicly disclosed vulnerability with a unique CVE id, and vulnerability databases by different vendors and organizations can reference the CVE ids in their vulnerability reports. In spite of the wide adoption of CVEs, traceability issues are still prevalent. Our empirical study on vulnerability traceability across four representative security databases (NVD, IBM X-Force, ExploitDB, Openwall) shows that there was a fast-increasing amount of CVE records, traceability delay, and missing issues become severe for the vulnerability databases. To address these issues, we develop an automatic traceability recovery method for recommending related external vulnerability reports to the reports in one database. As vulnerability reports from different databases differ in content details and length, our approach does not match the reports at the document level but extracts seven distinctive vulnerability key aspects that are widely present in vulnerability descriptions. As a proof of concept, we apply our methods to recommend the reports from IBM X-Force, ExploitDB and Openwall to the NVD report. We use NVD as the target because it is a de-facto standard vulnerability database that contains the most comprehensive list of vulnerabilities. Our experiments on a wide range of NLP methods show our aspect-level matching methods can achieve high MRR and accuracy for traceability recovery across heterogeneous vulnerability databases.\",\"PeriodicalId\":300084,\"journal\":{\"name\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSME55016.2022.00024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
安全数据库以文本形式描述已发现漏洞的特征,以便将来研究和修补。然而,由于不同的维护人员对漏洞有不同的看法,他们经常以不同的方式描述相同的漏洞,这为从不同的数据库收集有关漏洞的全面信息造成了障碍。为了缓解这一问题,我们建立了CVE (Common Vulnerability and Exposures)机制,通过唯一的CVE id来识别每个公开披露的漏洞,不同厂商和组织的漏洞数据库可以在各自的漏洞报告中引用CVE id。尽管cve被广泛采用,可追溯性问题仍然很普遍。我们对四个具有代表性的安全数据库(NVD、IBM X-Force、ExploitDB、Openwall)的漏洞可追溯性进行了实证研究,结果表明,漏洞数据库的CVE记录数量快速增长,可追溯性延迟,缺失问题变得严重。为了解决这些问题,我们开发了一种自动跟踪恢复方法,用于向一个数据库中的报告推荐相关的外部漏洞报告。由于来自不同数据库的漏洞报告的内容细节和长度不同,我们的方法并不匹配文档级别的报告,而是提取了在漏洞描述中广泛存在的七个不同的漏洞关键方面。作为概念证明,我们应用我们的方法将IBM X-Force、ExploitDB和Openwall的报告推荐到NVD报告中。我们使用NVD作为目标,因为它是一个事实上的标准漏洞数据库,包含了最全面的漏洞列表。我们在各种NLP方法上的实验表明,我们的方面级匹配方法可以实现跨异构漏洞数据库的高MRR和准确度的可追溯性恢复。
Heterogeneous Vulnerability Report Traceability Recovery by Vulnerability Aspect Matching
Security databases describe characteristics of discovered vulnerabilities in text for future studying and patching. However, due to different maintainers having different perspectives about vulnerabilities, they often describe the same vulnerability in different ways, creating obstacles for gathering comprehensive information about the vulnerabilities from different databases. To mitigate this problem, Common Vulnerability and Exposures (CVE) is established to identify each publicly disclosed vulnerability with a unique CVE id, and vulnerability databases by different vendors and organizations can reference the CVE ids in their vulnerability reports. In spite of the wide adoption of CVEs, traceability issues are still prevalent. Our empirical study on vulnerability traceability across four representative security databases (NVD, IBM X-Force, ExploitDB, Openwall) shows that there was a fast-increasing amount of CVE records, traceability delay, and missing issues become severe for the vulnerability databases. To address these issues, we develop an automatic traceability recovery method for recommending related external vulnerability reports to the reports in one database. As vulnerability reports from different databases differ in content details and length, our approach does not match the reports at the document level but extracts seven distinctive vulnerability key aspects that are widely present in vulnerability descriptions. As a proof of concept, we apply our methods to recommend the reports from IBM X-Force, ExploitDB and Openwall to the NVD report. We use NVD as the target because it is a de-facto standard vulnerability database that contains the most comprehensive list of vulnerabilities. Our experiments on a wide range of NLP methods show our aspect-level matching methods can achieve high MRR and accuracy for traceability recovery across heterogeneous vulnerability databases.