{"title":"ClusPar: A Game-Theoretic Approach for Efficient and Scalable Streaming Edge Partitioning","authors":"Zezhong Ding;Deyu Kong;Zhuoxu Zhang;Xike Xie;Jianliang Xu","doi":"10.1109/TC.2024.3475568","DOIUrl":null,"url":null,"abstract":"Streaming edge partitioning plays a crucial role in the distributed processing of large-scale web graphs, such as pagerank. The quality of partitioning is of utmost importance and directly affects the runtime cost of distributed graph processing. However, streaming graph clustering, a key component of mainstream streaming edge partitioning, is vertex-centric. This incurs a mismatch with the edge-centric partitioning strategy, necessitating additional post-processing and several graph traversals to transition from vertex-centric clusters to edge-centric partitions. This transition not only adds extra runtime overhead but also risks a decline in partitioning quality. In this paper, we propose a novel algorithm, called ClusPar, to address the problem of streaming edge partitioning. The ClusPar framework consists of two steps, streaming edge clustering and edge cluster partitioning. Different from prior studies, the first step traverses the input graph in a single pass to generate edge-centric clusters, while the second step applies game theory over these edge-centric clusters to produce partitions. Extensive experiments show that ClusPar outperforms the state-of-the-art streaming edge partitioning methods in terms of the partitioning quality, efficiency, and scalability.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"116-130"},"PeriodicalIF":3.6000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10707328/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Streaming edge partitioning plays a crucial role in the distributed processing of large-scale web graphs, such as pagerank. The quality of partitioning is of utmost importance and directly affects the runtime cost of distributed graph processing. However, streaming graph clustering, a key component of mainstream streaming edge partitioning, is vertex-centric. This incurs a mismatch with the edge-centric partitioning strategy, necessitating additional post-processing and several graph traversals to transition from vertex-centric clusters to edge-centric partitions. This transition not only adds extra runtime overhead but also risks a decline in partitioning quality. In this paper, we propose a novel algorithm, called ClusPar, to address the problem of streaming edge partitioning. The ClusPar framework consists of two steps, streaming edge clustering and edge cluster partitioning. Different from prior studies, the first step traverses the input graph in a single pass to generate edge-centric clusters, while the second step applies game theory over these edge-centric clusters to produce partitions. Extensive experiments show that ClusPar outperforms the state-of-the-art streaming edge partitioning methods in terms of the partitioning quality, efficiency, and scalability.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.