分而治之:I/O高效深度优先搜索

Zhiwei Zhang, J. Yu, Lu Qin, Zechao Shang
{"title":"分而治之:I/O高效深度优先搜索","authors":"Zhiwei Zhang, J. Yu, Lu Qin, Zechao Shang","doi":"10.1145/2723372.2723740","DOIUrl":null,"url":null,"abstract":"Depth-First Search (DFS), which traverses a graph in the depth- first order, is one of the fundamental graph operations, and the result of DFS over all nodes in G is a spanning tree known as a DFS-Tree. There are many graph algorithms that need DFS such as connected component computation, topological sort, community detection, eulerian path computation, graph bipartiteness testing, planar graph testing, etc, because the in-memory DFS algorithm shows it can be done in linear time w.r.t. the size of G. However, given the fact that real-world graphs grow rapidly in the big data era, the in-memory DFS algorithm cannot be used to handle a large graph that cannot be entirely held in main memory. In this paper, we focus on I/O efficiency and study semi-external algorithms to DFS a graph G which is on disk. Here, like the existing semi-external algorithms, we assume that a spanning tree of G can be held in main memory and the remaining edges of G are kept on disk, and compute the DFS-Tree in main memory with which DFS can be identified. We propose novel divide & conquer algorithms to DFS over a graph G on disk. In brief, we divide a graph into several subgraphs, compute the DFS-Tree for each subgraph independently, and then merge them together to compute the DFS-Tree for the whole graph. With the global DFS-Tree computed we identify DFS. We discuss the valid division, that can lead to the correct DFS, and the challenges to do so. We propose two division algorithms, named Divide-Star and Divide-TD, and a merge algorithm. We conduct extensive experimental studies using four real massive datasets and several synthetic datasets to confirm the I/O efficiency of our approach.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Divide & Conquer: I/O Efficient Depth-First Search\",\"authors\":\"Zhiwei Zhang, J. Yu, Lu Qin, Zechao Shang\",\"doi\":\"10.1145/2723372.2723740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Depth-First Search (DFS), which traverses a graph in the depth- first order, is one of the fundamental graph operations, and the result of DFS over all nodes in G is a spanning tree known as a DFS-Tree. There are many graph algorithms that need DFS such as connected component computation, topological sort, community detection, eulerian path computation, graph bipartiteness testing, planar graph testing, etc, because the in-memory DFS algorithm shows it can be done in linear time w.r.t. the size of G. However, given the fact that real-world graphs grow rapidly in the big data era, the in-memory DFS algorithm cannot be used to handle a large graph that cannot be entirely held in main memory. In this paper, we focus on I/O efficiency and study semi-external algorithms to DFS a graph G which is on disk. Here, like the existing semi-external algorithms, we assume that a spanning tree of G can be held in main memory and the remaining edges of G are kept on disk, and compute the DFS-Tree in main memory with which DFS can be identified. We propose novel divide & conquer algorithms to DFS over a graph G on disk. In brief, we divide a graph into several subgraphs, compute the DFS-Tree for each subgraph independently, and then merge them together to compute the DFS-Tree for the whole graph. With the global DFS-Tree computed we identify DFS. We discuss the valid division, that can lead to the correct DFS, and the challenges to do so. We propose two division algorithms, named Divide-Star and Divide-TD, and a merge algorithm. We conduct extensive experimental studies using four real massive datasets and several synthetic datasets to confirm the I/O efficiency of our approach.\",\"PeriodicalId\":168391,\"journal\":{\"name\":\"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2723372.2723740\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723372.2723740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

深度优先搜索(DFS)是一种基本的图操作,它以深度一阶的方式遍历图,对G中的所有节点进行DFS搜索的结果是一棵生成树,称为DFS树。有很多图的算法都需要DFS,如连通分量计算、拓扑排序、社区检测、欧拉路径计算、图的双分性测试、平面图测试等,因为内存DFS算法表明它可以在线性时间内完成,而不是g的大小。然而,考虑到现实世界的图在大数据时代快速增长,内存DFS算法不能用于处理不能完全保存在主存中的大型图。本文以I/O效率为核心,研究了磁盘上的图形G的半外部DFS算法。这里,与现有的半外部算法一样,我们假设G的生成树可以保存在主存中,G的剩余边保存在磁盘上,并在主存中计算DFS树,用它来识别DFS。我们提出了一种新的分治算法来处理磁盘上的图G上的DFS。简而言之,我们将一个图划分为几个子图,分别计算每个子图的dfs树,然后将它们合并在一起计算整个图的dfs树。通过计算全局DFS树,我们确定了DFS。我们讨论了可以导致正确DFS的有效划分,以及这样做的挑战。我们提出了两种除法算法,分别是Divide-Star和Divide-TD,以及一种合并算法。我们使用四个真实的大规模数据集和几个合成数据集进行了广泛的实验研究,以确认我们的方法的I/O效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Divide & Conquer: I/O Efficient Depth-First Search
Depth-First Search (DFS), which traverses a graph in the depth- first order, is one of the fundamental graph operations, and the result of DFS over all nodes in G is a spanning tree known as a DFS-Tree. There are many graph algorithms that need DFS such as connected component computation, topological sort, community detection, eulerian path computation, graph bipartiteness testing, planar graph testing, etc, because the in-memory DFS algorithm shows it can be done in linear time w.r.t. the size of G. However, given the fact that real-world graphs grow rapidly in the big data era, the in-memory DFS algorithm cannot be used to handle a large graph that cannot be entirely held in main memory. In this paper, we focus on I/O efficiency and study semi-external algorithms to DFS a graph G which is on disk. Here, like the existing semi-external algorithms, we assume that a spanning tree of G can be held in main memory and the remaining edges of G are kept on disk, and compute the DFS-Tree in main memory with which DFS can be identified. We propose novel divide & conquer algorithms to DFS over a graph G on disk. In brief, we divide a graph into several subgraphs, compute the DFS-Tree for each subgraph independently, and then merge them together to compute the DFS-Tree for the whole graph. With the global DFS-Tree computed we identify DFS. We discuss the valid division, that can lead to the correct DFS, and the challenges to do so. We propose two division algorithms, named Divide-Star and Divide-TD, and a merge algorithm. We conduct extensive experimental studies using four real massive datasets and several synthetic datasets to confirm the I/O efficiency of our approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信