Toward Fast and Scalable Random Walks over Disk-Resident Graphs via Efficient I/O Management

IF 2.1 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Storage Pub Date : 2022-11-11 DOI:https://dl.acm.org/doi/10.1145/3533579

Rui Wang, Yongkun Li, Yinlong Xu, Hong Xie, John C. S. Lui, Shuibing He

{"title":"Toward Fast and Scalable Random Walks over Disk-Resident Graphs via Efficient I/O Management","authors":"Rui Wang, Yongkun Li, Yinlong Xu, Hong Xie, John C. S. Lui, Shuibing He","doi":"https://dl.acm.org/doi/10.1145/3533579","DOIUrl":null,"url":null,"abstract":"<p>Traditional graph systems mainly use the iteration-based model, which iteratively loads graph blocks into memory for analysis so as to reduce random I/Os. However, this iteration-based model limits the efficiency and scalability of running random walk, which is a fundamental technique to analyze large graphs. In this article, we first propose a state-aware I/O model to improve the I/O efficiency of running random walk, then we develop a block-centric indexing and buffering scheme for managing walk data, and leverage an asynchronous walk updating strategy to improve random walk efficiency. We implement an I/O-efficient graph system, <sans-serif>GraphWalker</sans-serif>, which is efficient to handle very large disk-resident graphs and also scalable to run tens of billions of random walks with only a single commodity machine. Experiments show that <sans-serif>GraphWalker</sans-serif> can achieve more than an order of magnitude speedup when compared with DrunkardMob, which is tailored for random walks based on the classical graph system GraphChi, as well as two state-of-the-art single-machine graph systems, Graphene and GraFSoft. Furthermore, when compared with the most recent distributed system KnightKing, <sans-serif>GraphWalker</sans-serif> still achieves comparable performance with only a single machine, thereby making it a more cost-effective alternative.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"68 7","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3533579","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional graph systems mainly use the iteration-based model, which iteratively loads graph blocks into memory for analysis so as to reduce random I/Os. However, this iteration-based model limits the efficiency and scalability of running random walk, which is a fundamental technique to analyze large graphs. In this article, we first propose a state-aware I/O model to improve the I/O efficiency of running random walk, then we develop a block-centric indexing and buffering scheme for managing walk data, and leverage an asynchronous walk updating strategy to improve random walk efficiency. We implement an I/O-efficient graph system, GraphWalker, which is efficient to handle very large disk-resident graphs and also scalable to run tens of billions of random walks with only a single commodity machine. Experiments show that GraphWalker can achieve more than an order of magnitude speedup when compared with DrunkardMob, which is tailored for random walks based on the classical graph system GraphChi, as well as two state-of-the-art single-machine graph systems, Graphene and GraFSoft. Furthermore, when compared with the most recent distributed system KnightKing, GraphWalker still achieves comparable performance with only a single machine, thereby making it a more cost-effective alternative.

查看原文本刊更多论文

通过高效I/O管理实现磁盘驻留图的快速可伸缩随机漫步

传统的图系统主要使用基于迭代的模型，迭代地将图块加载到内存中进行分析，以减少随机I/ o。然而，这种基于迭代的模型限制了运行随机漫步的效率和可扩展性，而随机漫步是分析大型图的基本技术。在本文中，我们首先提出了一个状态感知的I/O模型来提高运行随机漫步的I/O效率，然后我们开发了一个以块为中心的索引和缓冲方案来管理漫步数据，并利用异步漫步更新策略来提高随机漫步效率。我们实现了一个I/ o高效的图形系统GraphWalker，它可以有效地处理非常大的磁盘驻留图形，并且可以扩展到只需一台商用机器就可以运行数百亿次随机漫步。实验表明，与DrunkardMob相比，GraphWalker可以实现超过一个数量级的加速，DrunkardMob是基于经典图形系统GraphChi以及两种最先进的单机图形系统石墨烯和GraFSoft为随机行走量身定制的。此外，与最新的分布式系统KnightKing相比，GraphWalker仅用一台机器就可以实现相当的性能，从而使其成为更具成本效益的替代方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Storage COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

4.20

自引率

5.90%

发文量

审稿时长

>12 weeks

期刊介绍： The ACM Transactions on Storage (TOS) is a new journal with an intent to publish original archival papers in the area of storage and closely related disciplines. Articles that appear in TOS will tend either to present new techniques and concepts or to report novel experiences and experiments with practical systems. Storage is a broad and multidisciplinary area that comprises of network protocols, resource management, data backup, replication, recovery, devices, security, and theory of data coding, densities, and low-power. Potential synergies among these fields are expected to open up new research directions.