基于异构图的漏洞可利用性预测

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-09-22 DOI:10.1016/j.knosys.2025.114517

Guo Xu, Xin Chen, Xinxin Cai, Dongjin Yu

{"title":"基于异构图的漏洞可利用性预测","authors":"Guo Xu, Xin Chen, Xinxin Cai, Dongjin Yu","doi":"10.1016/j.knosys.2025.114517","DOIUrl":null,"url":null,"abstract":"<div><div>Vulnerability exploitability prediction is the process predicting the likelihood of being exploited in real attacks by the assessment of known software vulnerabilities. Many methods have been proposed to solve the problem of exploitability prediction. However, they generally suffer from two problems. First, they only extract features from a single vulnerability, ignoring the impact of associated vulnerabilities. Second, they usually adopt simple methods (such as concatenation) to aggregate different information, which may overlook important relationships between features. In this paper, we propose a novel exploitability prediction method based on heterogeneous graphs, called ExPreHet. First, ExPreHet defines nodes and edges to construct a heterogeneous graph. Following a series of preprocessing steps, ExPreHet generates multiple attribute vectors for each node. By implementing a restart random walk strategy, ExPreHet ensures that each node can sample all categories of neighboring nodes and group them by node category. Then, ExPreHet aggregates all the attributes of each node to generate the content vector, and each category of neighboring nodes of this node to generate a category vector. After that, the content vector and all the category vectors are aggregated to generate the final representation of the node. Finally, these final representations are input into random forest (RF) for training the classifier. To effectively assess ExPreHet, this paper conducts experiments on a dataset, which contains 66,877 vulnerabilities. The experimental results show that ExPreHet achieves 83.24 %, 83.22 %, 83.28 %, 83.25 %, and 83.24 % in terms of accuracy, precision, recall, F1-score, and area under curve (AUC), respectively. ExPreHet performs significantly better than the baseline methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114517"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploitability prediction of vulnerabilities based on heterogeneous graphs\",\"authors\":\"Guo Xu, Xin Chen, Xinxin Cai, Dongjin Yu\",\"doi\":\"10.1016/j.knosys.2025.114517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Vulnerability exploitability prediction is the process predicting the likelihood of being exploited in real attacks by the assessment of known software vulnerabilities. Many methods have been proposed to solve the problem of exploitability prediction. However, they generally suffer from two problems. First, they only extract features from a single vulnerability, ignoring the impact of associated vulnerabilities. Second, they usually adopt simple methods (such as concatenation) to aggregate different information, which may overlook important relationships between features. In this paper, we propose a novel exploitability prediction method based on heterogeneous graphs, called ExPreHet. First, ExPreHet defines nodes and edges to construct a heterogeneous graph. Following a series of preprocessing steps, ExPreHet generates multiple attribute vectors for each node. By implementing a restart random walk strategy, ExPreHet ensures that each node can sample all categories of neighboring nodes and group them by node category. Then, ExPreHet aggregates all the attributes of each node to generate the content vector, and each category of neighboring nodes of this node to generate a category vector. After that, the content vector and all the category vectors are aggregated to generate the final representation of the node. Finally, these final representations are input into random forest (RF) for training the classifier. To effectively assess ExPreHet, this paper conducts experiments on a dataset, which contains 66,877 vulnerabilities. The experimental results show that ExPreHet achieves 83.24 %, 83.22 %, 83.28 %, 83.25 %, and 83.24 % in terms of accuracy, precision, recall, F1-score, and area under curve (AUC), respectively. ExPreHet performs significantly better than the baseline methods.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114517\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015564\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015564","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

漏洞利用性预测是通过评估已知的软件漏洞，预测在实际攻击中被利用的可能性的过程。为了解决可利用性预测问题，人们提出了许多方法。然而，他们通常会遇到两个问题。首先，它们只从单个漏洞中提取特性，而忽略了相关漏洞的影响。其次，它们通常采用简单的方法（如串联）来聚合不同的信息，这可能忽略了特征之间的重要关系。本文提出了一种新的基于异构图的可利用性预测方法——expreet。首先，expreet定义节点和边来构造异构图。经过一系列预处理步骤，expreet为每个节点生成多个属性向量。通过重新启动随机漫步策略，expreet确保每个节点都可以采样相邻节点的所有类别，并按节点类别分组。然后，expreet对每个节点的所有属性进行聚合生成内容向量，对该节点相邻节点的每个类别进行聚合生成类别向量。之后，将内容向量和所有类别向量聚合以生成节点的最终表示。最后，将这些最终表示输入到随机森林（RF）中用于训练分类器。为了有效评估expreet，本文对包含66,877个漏洞的数据集进行了实验。实验结果表明，expreet在准确率、精密度、召回率、f1分数和曲线下面积（AUC）方面分别达到83.24%、83.22%、83.28%、83.25%和83.24%。expreet的性能明显优于基准方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploitability prediction of vulnerabilities based on heterogeneous graphs

Vulnerability exploitability prediction is the process predicting the likelihood of being exploited in real attacks by the assessment of known software vulnerabilities. Many methods have been proposed to solve the problem of exploitability prediction. However, they generally suffer from two problems. First, they only extract features from a single vulnerability, ignoring the impact of associated vulnerabilities. Second, they usually adopt simple methods (such as concatenation) to aggregate different information, which may overlook important relationships between features. In this paper, we propose a novel exploitability prediction method based on heterogeneous graphs, called ExPreHet. First, ExPreHet defines nodes and edges to construct a heterogeneous graph. Following a series of preprocessing steps, ExPreHet generates multiple attribute vectors for each node. By implementing a restart random walk strategy, ExPreHet ensures that each node can sample all categories of neighboring nodes and group them by node category. Then, ExPreHet aggregates all the attributes of each node to generate the content vector, and each category of neighboring nodes of this node to generate a category vector. After that, the content vector and all the category vectors are aggregated to generate the final representation of the node. Finally, these final representations are input into random forest (RF) for training the classifier. To effectively assess ExPreHet, this paper conducts experiments on a dataset, which contains 66,877 vulnerabilities. The experimental results show that ExPreHet achieves 83.24 %, 83.22 %, 83.28 %, 83.25 %, and 83.24 % in terms of accuracy, precision, recall, F1-score, and area under curve (AUC), respectively. ExPreHet performs significantly better than the baseline methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.