Kartik Lakhotia, R. Kannan, Aditya Gaur, Ajitesh Srivastava, V. Prasanna
{"title":"Parallel edge-based sampling for static and dynamic graphs","authors":"Kartik Lakhotia, R. Kannan, Aditya Gaur, Ajitesh Srivastava, V. Prasanna","doi":"10.1145/3310273.3323052","DOIUrl":null,"url":null,"abstract":"Graph sampling is an important tool to obtain small and manageable subgraphs from large real-world graphs. Prior research has shown that Induced Edge Sampling (IES) outperforms other sampling methods in terms of the quality of subgraph obtained. Even though fast sampling is crucial for several workflows, there has been little work on parallel sampling algorithms in the past. In this paper, we present parIES - a framework for parallel Induced Edge Sampling on shared-memory parallel machines. parIES, equipped with optimized load balancing and synchronization avoiding strategies, can sample both static and streaming dynamic graphs, while achieving high scalability and parallel efficiency. We develop a lightweight concurrent hash table coupled with a space-efficient dynamic graph data structure to overcome the challenges and memory constraints of sampling streaming dynamic graphs. We evaluate parIES on a 16-core (32 threads) Intel server using 7 large synthetic and real-world networks. From a static graph, parIES can sample a subgraph with > 1.4B edges in < 2.5s and achieve upto 15.5X parallel speedup. For dynamic streaming graphs, parIES can process upto 86.7M edges per second achieving 15X parallel speedup.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3310273.3323052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Graph sampling is an important tool to obtain small and manageable subgraphs from large real-world graphs. Prior research has shown that Induced Edge Sampling (IES) outperforms other sampling methods in terms of the quality of subgraph obtained. Even though fast sampling is crucial for several workflows, there has been little work on parallel sampling algorithms in the past. In this paper, we present parIES - a framework for parallel Induced Edge Sampling on shared-memory parallel machines. parIES, equipped with optimized load balancing and synchronization avoiding strategies, can sample both static and streaming dynamic graphs, while achieving high scalability and parallel efficiency. We develop a lightweight concurrent hash table coupled with a space-efficient dynamic graph data structure to overcome the challenges and memory constraints of sampling streaming dynamic graphs. We evaluate parIES on a 16-core (32 threads) Intel server using 7 large synthetic and real-world networks. From a static graph, parIES can sample a subgraph with > 1.4B edges in < 2.5s and achieve upto 15.5X parallel speedup. For dynamic streaming graphs, parIES can process upto 86.7M edges per second achieving 15X parallel speedup.