Finding and Counting Tree-Like Subgraphs Using MapReduce

Zhao Zhao;Langshi Chen;Mihai Avram;Meng Li;Guanying Wang;Ali Butt;Maleq Khan;Madhav Marathe;Judy Qiu;Anil Vullikanti
{"title":"Finding and Counting Tree-Like Subgraphs Using MapReduce","authors":"Zhao Zhao;Langshi Chen;Mihai Avram;Meng Li;Guanying Wang;Ali Butt;Maleq Khan;Madhav Marathe;Judy Qiu;Anil Vullikanti","doi":"10.1109/TMSCS.2017.2768426","DOIUrl":null,"url":null,"abstract":"Several variants of the subgraph isomorphism problem, e.g., finding, counting, and estimating frequencies of subgraphs in networks arise in a number of real world applications, such as web analysis, disease diffusion prediction, and social network analysis. These problems are computationally challenging in having to scale to very large networks with millions of vertices. In this paper, we present SAHAD, a MapReduce algorithm for detecting and counting trees of bounded size using the elegant color coding technique developed by N. Alon et al. SAHAD is a randomized algorithm, and we show rigorous bounds on the approximation quality and the performance of it. SAHAD scales to very large networks comprising of 10\n<sup>7</sup>\n - 10\n<sup>8</sup>\n vertices and 10\n<sup>8</sup>\n - 10\n<sup>9</sup>\n edges and tree-like (acyclic) templates with up to 12 vertices. Further, we extend our results by implementing SAHAD in the Harp framework, which is more of a high performance computing environment. The new implementation gives 100x improvement in performance over the standard Hadoop implementation and achieves better performance than state-of-the-art MPI solutions on larger graphs.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"217-230"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2768426","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multi-Scale Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/8090537/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Several variants of the subgraph isomorphism problem, e.g., finding, counting, and estimating frequencies of subgraphs in networks arise in a number of real world applications, such as web analysis, disease diffusion prediction, and social network analysis. These problems are computationally challenging in having to scale to very large networks with millions of vertices. In this paper, we present SAHAD, a MapReduce algorithm for detecting and counting trees of bounded size using the elegant color coding technique developed by N. Alon et al. SAHAD is a randomized algorithm, and we show rigorous bounds on the approximation quality and the performance of it. SAHAD scales to very large networks comprising of 10 7 - 10 8 vertices and 10 8 - 10 9 edges and tree-like (acyclic) templates with up to 12 vertices. Further, we extend our results by implementing SAHAD in the Harp framework, which is more of a high performance computing environment. The new implementation gives 100x improvement in performance over the standard Hadoop implementation and achieves better performance than state-of-the-art MPI solutions on larger graphs.
使用MapReduce查找和计数树状子图
子图同构问题的几种变体,例如,在网络中查找、计数和估计子图的频率,出现在许多现实世界的应用中,如网络分析、疾病扩散预测和社交网络分析。这些问题在计算上具有挑战性,因为必须扩展到具有数百万个顶点的非常大的网络。在本文中,我们提出了SAHAD,这是一种使用N.Alon等人开发的优雅颜色编码技术来检测和计数有界大小的树的MapReduce算法。SAHAD是一种随机算法,我们对其近似质量和性能给出了严格的限制。SAHAD可扩展到由107-108个顶点和108-109条边组成的非常大的网络,以及最多有12个顶点的树状(非循环)模板。此外,我们通过在Harp框架中实现SAHAD来扩展我们的结果,Harp框架更像是一个高性能计算环境。与标准Hadoop实现相比,新实现的性能提高了100倍,并且在更大的图形上实现了比最先进的MPI解决方案更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信