Reet Barik, Marco Minutoli, M. Halappanavar, A. Kalyanaraman
{"title":"IMpart: A Partitioning-based Parallel Approach to Accelerate Influence Maximization","authors":"Reet Barik, Marco Minutoli, M. Halappanavar, A. Kalyanaraman","doi":"10.1109/HiPC56025.2022.00028","DOIUrl":null,"url":null,"abstract":"Influence maximization (IM) is a fundamental operation among graph problems that involve simulating a stochastic diffusion process on real-world networks. Given a graph G(V, E), the objective is to identify a small set of key influential \"seeds\"— i.e., a fixed-size set of k nodes, which when influenced is likely to lead to the maximum number of nodes in the network getting influenced. The problem has numerous applications including (but not limited to) viral marketing in social networks, epidemic control in contact networks, and in finding influential proteins in molecular networks. Despite its importance, application of influence maximization at scale continues to pose significant challenges. While the problem is NP-hard, efficient approximation algorithms that use greedy hill climbing are used in practice. However those algorithms consume hours of multithreaded execution time even on modest-sized inputs with hundreds of thousands of nodes. In this paper, we present IMpart, a partitioning-based approach to accelerate greedy hill climbing based IM approaches on both shared and distributed memory computers. In particular, we present two parallel algorithms— one that uses graph partitioning (IMpart-metis) and another that uses community-aware partitioning (IMpart-gratis)— with provable guarantees on the quality of approximation. Experimental results show that our approaches are able to deliver two to three orders of magnitude speedup over a state-of-the-art multithreaded hill climbing implementation with negligible loss in quality. For instance, on one of the modest-sized inputs (Slashdot: 73K nodes; 905K edges), our partitioning-based shared memory implementation yields 4610× speedup, reducing the runtime from 9h 36m to 7 seconds on 128 threads. Furthermore, our distributed memory implementation enhances problem size reach to graph inputs with ×106 nodes and ×108 edges and enables sub-minute computation of IM solutions.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC56025.2022.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Influence maximization (IM) is a fundamental operation among graph problems that involve simulating a stochastic diffusion process on real-world networks. Given a graph G(V, E), the objective is to identify a small set of key influential "seeds"— i.e., a fixed-size set of k nodes, which when influenced is likely to lead to the maximum number of nodes in the network getting influenced. The problem has numerous applications including (but not limited to) viral marketing in social networks, epidemic control in contact networks, and in finding influential proteins in molecular networks. Despite its importance, application of influence maximization at scale continues to pose significant challenges. While the problem is NP-hard, efficient approximation algorithms that use greedy hill climbing are used in practice. However those algorithms consume hours of multithreaded execution time even on modest-sized inputs with hundreds of thousands of nodes. In this paper, we present IMpart, a partitioning-based approach to accelerate greedy hill climbing based IM approaches on both shared and distributed memory computers. In particular, we present two parallel algorithms— one that uses graph partitioning (IMpart-metis) and another that uses community-aware partitioning (IMpart-gratis)— with provable guarantees on the quality of approximation. Experimental results show that our approaches are able to deliver two to three orders of magnitude speedup over a state-of-the-art multithreaded hill climbing implementation with negligible loss in quality. For instance, on one of the modest-sized inputs (Slashdot: 73K nodes; 905K edges), our partitioning-based shared memory implementation yields 4610× speedup, reducing the runtime from 9h 36m to 7 seconds on 128 threads. Furthermore, our distributed memory implementation enhances problem size reach to graph inputs with ×106 nodes and ×108 edges and enables sub-minute computation of IM solutions.