Finding the Source in Networks: An Approach Based on Structural Entropy

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology Pub Date : 2023-02-06 DOI:10.1145/3568309

Chong Zhang, Qiang Guo, Luoyi Fu, Jiaxin Ding, Xinde Cao, Fei Long, Xinbing Wang, Cheng Zhou

{"title":"Finding the Source in Networks: An Approach Based on Structural Entropy","authors":"Chong Zhang, Qiang Guo, Luoyi Fu, Jiaxin Ding, Xinde Cao, Fei Long, Xinbing Wang, Cheng Zhou","doi":"10.1145/3568309","DOIUrl":null,"url":null,"abstract":"The popularity of intelligent devices provides straightforward access to the Internet and online social networks. However, the quick and easy data updates from networks also benefit the risk spreading, such as rumor, malware, or computer viruses. To this end, this article studies the problem of source detection, which is to infer the source node out of an aftermath of a cascade, that is, the observed infected graph GN of the network at some time. Prior arts have adopted various statistical quantities such as degree, distance, or infection size to reflect the structural centrality of the source. In this article, we propose a new metric that we call the infected tree entropy (ITE), to utilize richer underlying structural features for source detection. Our idea of ITE is inspired by the conception of structural entropy [21], which demonstrated that the minimization of average bits to encode the network structures with different partitions is the principle for detecting the natural or true structures in real-world networks. Accordingly, our proposed ITE based estimator for the source tries to minimize the coding of network partitions brought by the infected tree rooted at all the potential sources, thus minimizing the structural deviation between the cascades from the potential sources and the actual infection process included in GN. On polynomially growing geometric trees, with increasing tree heterogeneity, the ITE estimator remarkably yields more reliable detection under only moderate infection sizes, and returns an asymptotically complete detection. In contrast, for regular expanding trees, we still observe guaranteed detection probability of ITE estimator even with an infinite infection size, thanks to the degree regularity property. We also algorithmically realize the ITE based detection that enjoys linear time complexity via a message-passing scheme, and further extend it to general graphs. Extensive experiments on synthetic and real datasets confirm the superiority of ITE to the baselines. For example, ITE returns an accuracy of 85%, ranking the source among the top 10%, far exceeding 55% of the classic algorithm on scale-free networks.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"23 1","pages":"1 - 25"},"PeriodicalIF":4.1000,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3568309","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The popularity of intelligent devices provides straightforward access to the Internet and online social networks. However, the quick and easy data updates from networks also benefit the risk spreading, such as rumor, malware, or computer viruses. To this end, this article studies the problem of source detection, which is to infer the source node out of an aftermath of a cascade, that is, the observed infected graph GN of the network at some time. Prior arts have adopted various statistical quantities such as degree, distance, or infection size to reflect the structural centrality of the source. In this article, we propose a new metric that we call the infected tree entropy (ITE), to utilize richer underlying structural features for source detection. Our idea of ITE is inspired by the conception of structural entropy [21], which demonstrated that the minimization of average bits to encode the network structures with different partitions is the principle for detecting the natural or true structures in real-world networks. Accordingly, our proposed ITE based estimator for the source tries to minimize the coding of network partitions brought by the infected tree rooted at all the potential sources, thus minimizing the structural deviation between the cascades from the potential sources and the actual infection process included in GN. On polynomially growing geometric trees, with increasing tree heterogeneity, the ITE estimator remarkably yields more reliable detection under only moderate infection sizes, and returns an asymptotically complete detection. In contrast, for regular expanding trees, we still observe guaranteed detection probability of ITE estimator even with an infinite infection size, thanks to the degree regularity property. We also algorithmically realize the ITE based detection that enjoys linear time complexity via a message-passing scheme, and further extend it to general graphs. Extensive experiments on synthetic and real datasets confirm the superiority of ITE to the baselines. For example, ITE returns an accuracy of 85%, ranking the source among the top 10%, far exceeding 55% of the classic algorithm on scale-free networks.

查看原文本刊更多论文

在网络中寻找源:一种基于结构熵的方法

智能设备的普及提供了直接访问互联网和在线社交网络的途径。然而，网络上快速便捷的数据更新也有利于风险的传播，如谣言、恶意软件或计算机病毒。为此，本文研究了源检测问题，即从级联的余波中推断出源节点，即从某一时刻观察到的网络感染图GN中推断出源节点。现有技术采用了各种统计量，如程度、距离或感染大小来反映源的结构中心性。在本文中，我们提出了一种新的度量，我们称之为感染树熵(ITE)，以利用更丰富的底层结构特征进行源检测。我们的ITE思想受到结构熵[21]概念的启发，该概念证明了对不同分区的网络结构进行编码的平均比特的最小化是检测现实世界网络中自然或真实结构的原则。因此，我们提出的基于ITE的源估计器试图最小化扎根于所有潜在源的感染树所带来的网络分区编码，从而最小化来自潜在源的级联与GN中包含的实际感染过程之间的结构偏差。在多项式生长的几何树上，随着树异质性的增加，ITE估计器在中等感染规模下显著地产生更可靠的检测，并返回渐近完全检测。相比之下，对于规则扩展树，由于程度正则性，即使感染规模无限，我们仍然观察到ITE估计器的检测概率是有保证的。我们还通过消息传递方案算法实现了具有线性时间复杂度的基于ITE的检测，并将其进一步扩展到一般图中。在合成数据集和真实数据集上进行的大量实验证实了ITE相对于基线的优越性。例如，ITE返回的准确率为85%，将源排在前10%，远远超过无标度网络上经典算法的55%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Internet Technology 工程技术-计算机：软件工程

CiteScore

10.30

自引率

1.90%

发文量

137

审稿时长

>12 weeks

期刊介绍： ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.