Na Li , Tianao Li , Zhaorui Ma , Xinhao Hu , Shicheng Zhang , Fenlin Liu , Xiaowen Quan , Xiangyang Luo , Guoming Ren , Hao Feng , Shubo Zhang
{"title":"HpGraphNEI:基于异亲图学习的网络实体识别模型","authors":"Na Li , Tianao Li , Zhaorui Ma , Xinhao Hu , Shicheng Zhang , Fenlin Liu , Xiaowen Quan , Xiangyang Luo , Guoming Ren , Hao Feng , Shubo Zhang","doi":"10.1016/j.ipm.2024.103810","DOIUrl":null,"url":null,"abstract":"<div><p>Network entities have important asset mapping, vulnerability, and service delivery applications. In cyberspace, where the network structure is complex and the number of entities is large, effectively obtaining the relevant attributes of entities is a difficult task. Graph neural network-based approaches focus on target IP node messaging from neighboring nodes; however, the graph learning task ignores the heterophilous relationship of network entity identification (NEI) tasks in the graph structure and fails to effectively message from non-neighboring nodes. To address the limitations of the existing task, we propose a NEI model based on heterophilous graph learning (HpGraphNEI); HpGraphNEI converts heterophilous graphs under the NEI task into homophilous graphs and uses the graph learning mechanism to carry out attribute completion task for incomplete entity attributes. First, the acquired dataset is feature-extracted by network measurement, and the clustering algorithm is employed to divide the target nodes into communities. Second, the network topology graph is constructed to embed the node attribute information and neighborhood structure information into the graph in the form of feature vectors. Then, the global attention in the community is calculated according to the attention results, the edges with strong correlation in the network are filtered, the adjacency matrix is reconstructed, and then the updated node information is aggregated to complete the incomplete attribute completion. Fourth, the updated nodes are categorized to output network entity categories and construct network entity portraits based on the attribute completion nodes. We conducted a 2-month data collection in three real regions and successfully identified 6 types of network entities. Compared with the optimal baseline, all the metrics have significantly improved, with NEI accuracy above 93.74% and up to 96.28%, improved 2.27% to 2.69%.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HpGraphNEI: A network entity identification model based on heterophilous graph learning\",\"authors\":\"Na Li , Tianao Li , Zhaorui Ma , Xinhao Hu , Shicheng Zhang , Fenlin Liu , Xiaowen Quan , Xiangyang Luo , Guoming Ren , Hao Feng , Shubo Zhang\",\"doi\":\"10.1016/j.ipm.2024.103810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Network entities have important asset mapping, vulnerability, and service delivery applications. In cyberspace, where the network structure is complex and the number of entities is large, effectively obtaining the relevant attributes of entities is a difficult task. Graph neural network-based approaches focus on target IP node messaging from neighboring nodes; however, the graph learning task ignores the heterophilous relationship of network entity identification (NEI) tasks in the graph structure and fails to effectively message from non-neighboring nodes. To address the limitations of the existing task, we propose a NEI model based on heterophilous graph learning (HpGraphNEI); HpGraphNEI converts heterophilous graphs under the NEI task into homophilous graphs and uses the graph learning mechanism to carry out attribute completion task for incomplete entity attributes. First, the acquired dataset is feature-extracted by network measurement, and the clustering algorithm is employed to divide the target nodes into communities. Second, the network topology graph is constructed to embed the node attribute information and neighborhood structure information into the graph in the form of feature vectors. Then, the global attention in the community is calculated according to the attention results, the edges with strong correlation in the network are filtered, the adjacency matrix is reconstructed, and then the updated node information is aggregated to complete the incomplete attribute completion. Fourth, the updated nodes are categorized to output network entity categories and construct network entity portraits based on the attribute completion nodes. We conducted a 2-month data collection in three real regions and successfully identified 6 types of network entities. Compared with the optimal baseline, all the metrics have significantly improved, with NEI accuracy above 93.74% and up to 96.28%, improved 2.27% to 2.69%.</p></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324001699\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001699","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
网络实体具有重要的资产映射、脆弱性和服务提供应用。在网络空间中,网络结构复杂,实体数量庞大,有效获取实体的相关属性是一项艰巨的任务。基于图神经网络的方法侧重于目标 IP 节点从相邻节点发送信息;然而,图学习任务忽略了图结构中网络实体识别(NEI)任务的异亲关系,无法有效地从非相邻节点发送信息。针对现有任务的局限性,我们提出了一种基于异亲图学习的网络实体识别模型(HpGraphNEI);HpGraphNEI 将网络实体识别任务下的异亲图转换为同亲图,并利用图学习机制完成不完整实体属性的属性补全任务。首先,通过网络测量对获取的数据集进行特征提取,并采用聚类算法将目标节点划分为社区。其次,构建网络拓扑图,将节点属性信息和邻域结构信息以特征向量的形式嵌入图中。然后,根据关注度结果计算社区内的全局关注度,过滤网络中相关性较强的边,重构邻接矩阵,再汇总更新后的节点信息,完成不完整的属性补全。第四,对更新后的节点进行分类,输出网络实体类别,并根据属性补全节点构建网络实体肖像。我们在三个实际地区进行了为期 2 个月的数据采集,成功识别了 6 类网络实体。与最优基线相比,所有指标都有明显改善,其中 NEI 准确率高于 93.74%,最高达 96.28%,提高了 2.27% 至 2.69%。
HpGraphNEI: A network entity identification model based on heterophilous graph learning
Network entities have important asset mapping, vulnerability, and service delivery applications. In cyberspace, where the network structure is complex and the number of entities is large, effectively obtaining the relevant attributes of entities is a difficult task. Graph neural network-based approaches focus on target IP node messaging from neighboring nodes; however, the graph learning task ignores the heterophilous relationship of network entity identification (NEI) tasks in the graph structure and fails to effectively message from non-neighboring nodes. To address the limitations of the existing task, we propose a NEI model based on heterophilous graph learning (HpGraphNEI); HpGraphNEI converts heterophilous graphs under the NEI task into homophilous graphs and uses the graph learning mechanism to carry out attribute completion task for incomplete entity attributes. First, the acquired dataset is feature-extracted by network measurement, and the clustering algorithm is employed to divide the target nodes into communities. Second, the network topology graph is constructed to embed the node attribute information and neighborhood structure information into the graph in the form of feature vectors. Then, the global attention in the community is calculated according to the attention results, the edges with strong correlation in the network are filtered, the adjacency matrix is reconstructed, and then the updated node information is aggregated to complete the incomplete attribute completion. Fourth, the updated nodes are categorized to output network entity categories and construct network entity portraits based on the attribute completion nodes. We conducted a 2-month data collection in three real regions and successfully identified 6 types of network entities. Compared with the optimal baseline, all the metrics have significantly improved, with NEI accuracy above 93.74% and up to 96.28%, improved 2.27% to 2.69%.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.