Yifei Xia, Feng Zhang, Qingyu Xu, Mingde Zhang, Zhiming Yao, Lv Lu, Xiaoyong Du, Dong Deng, Bingsheng He, Siqi Ma
{"title":"GPU-based butterfly counting","authors":"Yifei Xia, Feng Zhang, Qingyu Xu, Mingde Zhang, Zhiming Yao, Lv Lu, Xiaoyong Du, Dong Deng, Bingsheng He, Siqi Ma","doi":"10.1007/s00778-024-00861-0","DOIUrl":null,"url":null,"abstract":"<p>When dealing with large bipartite graphs, butterfly counting is a crucial and time-consuming operation. Graphics processing units (GPUs) are widely used parallel heterogeneous devices that can significantly boost performance for data science programs. However, currently no work enables efficient butterfly counting on GPU. To fill this gap, we propose a GPU-based butterfly counting method, called G-BFC. G-BFC solves three significant technical problems. First, butterfly counting involves massive serial operations, which leads to severe synchronization overheads and performance degradation. We unlock the serial region and utilize the shared memory on GPU to efficiently handle it. Second, butterfly counting on GPU faces the workload imbalance problem. To maximize efficiency, we develop a novel adaptive strategy to balance the workload among threads. Third, the large number of two-hop paths, also known as wedges, in bipartite graphs make parallel butterfly counting difficult to traverse. We develop an innovative preprocessing strategy that can significantly cut down on the required number of wedges. We conduct comprehensive experiments on both server-grade and edge-grade GPU platforms, and experiments show that G-BFC brings significant performance benefits. G-BFC achieves 4.84<span>\\(\\times \\)</span> performance speedup over the state-of-the-art solution on eleven real-world datasets.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-024-00861-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
When dealing with large bipartite graphs, butterfly counting is a crucial and time-consuming operation. Graphics processing units (GPUs) are widely used parallel heterogeneous devices that can significantly boost performance for data science programs. However, currently no work enables efficient butterfly counting on GPU. To fill this gap, we propose a GPU-based butterfly counting method, called G-BFC. G-BFC solves three significant technical problems. First, butterfly counting involves massive serial operations, which leads to severe synchronization overheads and performance degradation. We unlock the serial region and utilize the shared memory on GPU to efficiently handle it. Second, butterfly counting on GPU faces the workload imbalance problem. To maximize efficiency, we develop a novel adaptive strategy to balance the workload among threads. Third, the large number of two-hop paths, also known as wedges, in bipartite graphs make parallel butterfly counting difficult to traverse. We develop an innovative preprocessing strategy that can significantly cut down on the required number of wedges. We conduct comprehensive experiments on both server-grade and edge-grade GPU platforms, and experiments show that G-BFC brings significant performance benefits. G-BFC achieves 4.84\(\times \) performance speedup over the state-of-the-art solution on eleven real-world datasets.