Influence Maximization Revisited: Efficient Sampling with Bound Tightened

IF 2.2 2区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Database Systems Pub Date : 2022-08-18 DOI:https://dl.acm.org/doi/10.1145/3533817

Qintian Guo, Sibo Wang, Zhewei Wei, Wenqing Lin, Jing Tang

{"title":"Influence Maximization Revisited: Efficient Sampling with Bound Tightened","authors":"Qintian Guo, Sibo Wang, Zhewei Wei, Wenqing Lin, Jing Tang","doi":"https://dl.acm.org/doi/10.1145/3533817","DOIUrl":null,"url":null,"abstract":"Given a social network G with n nodes and m edges, a positive integer k, and a cascade model C, the influence maximization (IM) problem asks for k nodes in G such that the expected number of nodes influenced by the k nodes under cascade model C is maximized. The state-of-the-art approximate solutions run in O(k(n+m)log n/ε2) expected time while returning a (1 - 1/e - ε) approximate solution with at least 1 - 1/n probability. A key phase of these IM algorithms is the random reverse reachable (RR) set generation, and this phase significantly affects the efficiency and scalability of the state-of-the-art IM algorithms.In this article, we present a study on this key phase and propose an efficient random RR set generation algorithm under IC model. With the new algorithm, we show that the expected running time of existing IM algorithms under IC model can be improved to O(k ċ n log n ċ2), when for any node v, the total weight of its incoming edges is no larger than a constant. For the general IC model where the weights are skewed, we present a sampling algorithm SKIP. To the best of our knowledge, it is the first index-free algorithm that achieves the optimal time complexity of the sorted subset sampling problem.Moreover, existing approximate IM algorithms suffer from scalability issues in high influence networks where the size of random RR sets is usually quite large. We tackle this challenging issue by reducing the average size of random RR sets without sacrificing the approximation guarantee. The proposed solution is orders of magnitude faster than states of the art as shown in our experiment.Besides, we investigate the issues of forward propagation and derive its time complexity with our proposed subset sampling techniques. We also present a heuristic condition to indicate when the forward propagation approach should be utilized to estimate the expected influence of a given seed set.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"6 4","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2022-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3533817","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Given a social network G with n nodes and m edges, a positive integer k, and a cascade model C, the influence maximization (IM) problem asks for k nodes in G such that the expected number of nodes influenced by the k nodes under cascade model C is maximized. The state-of-the-art approximate solutions run in O(k(n+m)log n/ε²) expected time while returning a (1 - 1/e - ε) approximate solution with at least 1 - 1/n probability. A key phase of these IM algorithms is the random reverse reachable (RR) set generation, and this phase significantly affects the efficiency and scalability of the state-of-the-art IM algorithms.

In this article, we present a study on this key phase and propose an efficient random RR set generation algorithm under IC model. With the new algorithm, we show that the expected running time of existing IM algorithms under IC model can be improved to O(k ċ n log n ċ²), when for any node v, the total weight of its incoming edges is no larger than a constant. For the general IC model where the weights are skewed, we present a sampling algorithm SKIP. To the best of our knowledge, it is the first index-free algorithm that achieves the optimal time complexity of the sorted subset sampling problem.

Moreover, existing approximate IM algorithms suffer from scalability issues in high influence networks where the size of random RR sets is usually quite large. We tackle this challenging issue by reducing the average size of random RR sets without sacrificing the approximation guarantee. The proposed solution is orders of magnitude faster than states of the art as shown in our experiment.

Besides, we investigate the issues of forward propagation and derive its time complexity with our proposed subset sampling techniques. We also present a heuristic condition to indicate when the forward propagation approach should be utilized to estimate the expected influence of a given seed set.

查看原文本刊更多论文

影响最大化再论:边界收紧的有效采样

给定一个有n个节点和m条边的社交网络G，一个正整数k和一个级联模型C，影响最大化(IM)问题要求在G中有k个节点，使得在级联模型C下受k个节点影响的节点的期望数量最大化。最先进的近似解在O(k(n+m)log n/ε2)预期时间内运行，同时返回(1 - 1/e - ε)近似解，概率至少为1 - 1/n。随机反向可达集的生成是IM算法的一个关键阶段，这一阶段对当前IM算法的效率和可扩展性有重要影响。本文对这一关键阶段进行了研究，提出了一种高效的IC模型下随机RR集生成算法。利用新算法，我们证明了现有IM算法在IC模型下的期望运行时间可以提高到O(k * * n log n ċ2)，对于任何节点v，其传入边的总权重不大于一个常数。对于权重偏斜的一般集成电路模型，我们提出了一种SKIP采样算法。据我们所知，它是第一个实现排序子集采样问题最优时间复杂度的无索引算法。此外，现有的近似IM算法在高影响力网络中存在可扩展性问题，其中随机RR集的大小通常相当大。我们通过在不牺牲近似保证的情况下减少随机RR集的平均大小来解决这个具有挑战性的问题。正如我们的实验所示，所提出的解决方案比目前的技术状态快了几个数量级。此外，我们还研究了前向传播问题，并利用我们提出的子集采样技术推导了前向传播的时间复杂度。我们还提出了一个启发式条件，以指示何时应使用前向传播方法来估计给定种子集的预期影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Database Systems 工程技术-计算机：软件工程

CiteScore

5.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.