Haipeng Chen, R. Liu, Noseong Park, V. S. Subrahmanian
{"title":"使用Twitter预测漏洞何时会被利用","authors":"Haipeng Chen, R. Liu, Noseong Park, V. S. Subrahmanian","doi":"10.1145/3292500.3330742","DOIUrl":null,"url":null,"abstract":"When a new cyber-vulnerability is detected, a Common Vulnerability and Exposure (CVE) number is attached to it. Malicious \"exploits'' may use these vulnerabilities to carry out attacks. Unlike works which study if a CVE will be used in an exploit, we study the problem of predicting when an exploit is first seen. This is an important question for system administrators as they need to devote scarce resources to take corrective action when a new vulnerability emerges. Moreover, past works assume that CVSS scores (released by NIST) are available for predictions, but we show on average that 49% of real world exploits occur before CVSS scores are published. This means that past works, which use CVSS scores, miss almost half of the exploits. In this paper, we propose a novel framework to predict when a vulnerability will be exploited via Twitter discussion, without using CVSS score information. We introduce the unique concept of a family of CVE-Author-Tweet (CAT) graphs and build a novel set of features based on such graphs. We define recurrence relations capturing \"hotness\" of tweets, \"expertise\" of Twitter users on CVEs, and \"availability\" of information about CVEs, and prove that we can solve these recurrences via a fix point algorithm. Our second innovation adopts Hawkes processes to estimate the number of tweets/retweets related to the CVEs. Using the above two sets of novel features, we propose two ensemble forecast models FEEU (for classification) and FRET (for regression) to predict when a CVE will be exploited. Compared with natural adaptations of past works (which predict if an exploit will be used), FEEU increases F1 score by 25.1%, while FRET decreases MAE by 37.2%.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"Using Twitter to Predict When Vulnerabilities will be Exploited\",\"authors\":\"Haipeng Chen, R. Liu, Noseong Park, V. S. Subrahmanian\",\"doi\":\"10.1145/3292500.3330742\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When a new cyber-vulnerability is detected, a Common Vulnerability and Exposure (CVE) number is attached to it. Malicious \\\"exploits'' may use these vulnerabilities to carry out attacks. Unlike works which study if a CVE will be used in an exploit, we study the problem of predicting when an exploit is first seen. This is an important question for system administrators as they need to devote scarce resources to take corrective action when a new vulnerability emerges. Moreover, past works assume that CVSS scores (released by NIST) are available for predictions, but we show on average that 49% of real world exploits occur before CVSS scores are published. This means that past works, which use CVSS scores, miss almost half of the exploits. In this paper, we propose a novel framework to predict when a vulnerability will be exploited via Twitter discussion, without using CVSS score information. We introduce the unique concept of a family of CVE-Author-Tweet (CAT) graphs and build a novel set of features based on such graphs. We define recurrence relations capturing \\\"hotness\\\" of tweets, \\\"expertise\\\" of Twitter users on CVEs, and \\\"availability\\\" of information about CVEs, and prove that we can solve these recurrences via a fix point algorithm. Our second innovation adopts Hawkes processes to estimate the number of tweets/retweets related to the CVEs. Using the above two sets of novel features, we propose two ensemble forecast models FEEU (for classification) and FRET (for regression) to predict when a CVE will be exploited. Compared with natural adaptations of past works (which predict if an exploit will be used), FEEU increases F1 score by 25.1%, while FRET decreases MAE by 37.2%.\",\"PeriodicalId\":186134,\"journal\":{\"name\":\"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3292500.3330742\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3292500.3330742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30
摘要
当检测到新的网络漏洞时,系统会为其附加一个CVE (Common Vulnerability and Exposure)编号。恶意的“漏洞利用”可能会利用这些漏洞进行攻击。与研究CVE是否会被用于攻击的工作不同,我们研究的是预测攻击何时首次被发现的问题。对于系统管理员来说,这是一个重要的问题,因为当出现新的漏洞时,他们需要投入稀缺的资源来采取纠正措施。此外,过去的工作假设CVSS分数(由NIST发布)可用于预测,但我们显示,平均49%的现实世界漏洞利用发生在CVSS分数公布之前。这意味着过去使用CVSS分数的作品几乎错过了一半的漏洞。在本文中,我们提出了一个新的框架来预测何时漏洞将通过Twitter讨论被利用,而不使用CVSS评分信息。我们引入了CVE-Author-Tweet (CAT)图族的独特概念,并基于这些图构建了一组新的特征。我们定义了捕获tweet的“热度”、Twitter用户对cve的“专业度”和cve信息的“可用性”的递归关系,并证明了我们可以通过不动点算法求解这些递归关系。我们的第二个创新采用霍克斯流程来估计与cve相关的推文/转发数量。利用上述两组新特征,我们提出了两个集成预测模型FEEU(用于分类)和FRET(用于回归)来预测CVE何时会被利用。与过去作品的自然改编(预测漏洞是否会被利用)相比,FEEU使F1得分提高了25.1%,而FRET使MAE得分降低了37.2%。
Using Twitter to Predict When Vulnerabilities will be Exploited
When a new cyber-vulnerability is detected, a Common Vulnerability and Exposure (CVE) number is attached to it. Malicious "exploits'' may use these vulnerabilities to carry out attacks. Unlike works which study if a CVE will be used in an exploit, we study the problem of predicting when an exploit is first seen. This is an important question for system administrators as they need to devote scarce resources to take corrective action when a new vulnerability emerges. Moreover, past works assume that CVSS scores (released by NIST) are available for predictions, but we show on average that 49% of real world exploits occur before CVSS scores are published. This means that past works, which use CVSS scores, miss almost half of the exploits. In this paper, we propose a novel framework to predict when a vulnerability will be exploited via Twitter discussion, without using CVSS score information. We introduce the unique concept of a family of CVE-Author-Tweet (CAT) graphs and build a novel set of features based on such graphs. We define recurrence relations capturing "hotness" of tweets, "expertise" of Twitter users on CVEs, and "availability" of information about CVEs, and prove that we can solve these recurrences via a fix point algorithm. Our second innovation adopts Hawkes processes to estimate the number of tweets/retweets related to the CVEs. Using the above two sets of novel features, we propose two ensemble forecast models FEEU (for classification) and FRET (for regression) to predict when a CVE will be exploited. Compared with natural adaptations of past works (which predict if an exploit will be used), FEEU increases F1 score by 25.1%, while FRET decreases MAE by 37.2%.