海量图的并行和分布式三角形列表

Ilias Giechaskiel, G. Panagopoulos, Eiko Yoneki
{"title":"海量图的并行和分布式三角形列表","authors":"Ilias Giechaskiel, G. Panagopoulos, Eiko Yoneki","doi":"10.1109/ICPP.2015.46","DOIUrl":null,"url":null,"abstract":"This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectivity metrics, but the majority of existing algorithms for massive graphs are sequential, while distributed versions of algorithms do not guarantee their CPU, I/O, Memory, or Network requirements. Our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient external-memory access in distributed environments instead of fitting sub graphs into memory. It works by performing efficient orientation and load-balancing steps, and replicating graphs across machines by using an extended version of Hu et al.'s Massive Graph Triangulation algorithm. PDTL suits a variety of computational environments, from single-core machines to high-end clusters, and computes the exact triangle count on graphs of over 6B edges and 1B vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the state-of-the-art systems Power Graph, OPT, and PATRIC by 2x to 4x. Our approach thus highlights the importance of I/O in a distributed environment.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"PDTL: Parallel and Distributed Triangle Listing for Massive Graphs\",\"authors\":\"Ilias Giechaskiel, G. Panagopoulos, Eiko Yoneki\",\"doi\":\"10.1109/ICPP.2015.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectivity metrics, but the majority of existing algorithms for massive graphs are sequential, while distributed versions of algorithms do not guarantee their CPU, I/O, Memory, or Network requirements. Our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient external-memory access in distributed environments instead of fitting sub graphs into memory. It works by performing efficient orientation and load-balancing steps, and replicating graphs across machines by using an extended version of Hu et al.'s Massive Graph Triangulation algorithm. PDTL suits a variety of computational environments, from single-core machines to high-end clusters, and computes the exact triangle count on graphs of over 6B edges and 1B vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the state-of-the-art systems Power Graph, OPT, and PATRIC by 2x to 4x. Our approach thus highlights the importance of I/O in a distributed environment.\",\"PeriodicalId\":423007,\"journal\":{\"name\":\"2015 44th International Conference on Parallel Processing\",\"volume\":\"126 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 44th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2015.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 44th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2015.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

摘要

本文提出了第一个具有可证明的CPU、I/O、内存和网络边界的分布式三角形列表算法。在图中查找所有三角形(3-cliques)有许多用于密度和连通性指标的应用程序,但是大多数用于大规模图的现有算法是顺序的,而分布式版本的算法不能保证它们的CPU、I/O、内存或网络需求。我们的并行和分布式三角列表(PDTL)框架侧重于分布式环境中高效的外部内存访问,而不是将子图拟合到内存中。它的工作原理是执行有效的定向和负载平衡步骤,并通过使用Hu等人的大规模图形三角化算法的扩展版本在机器上复制图形。PDTL适用于各种计算环境,从单核机器到高端集群,并在超过6B条边和1B个顶点的图(例如Yahoo图)上计算精确的三角形计数,比最先进的系统Power Graph, OPT和patrick的性能和使用的资源要少2到4倍。因此,我们的方法强调了I/O在分布式环境中的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PDTL: Parallel and Distributed Triangle Listing for Massive Graphs
This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectivity metrics, but the majority of existing algorithms for massive graphs are sequential, while distributed versions of algorithms do not guarantee their CPU, I/O, Memory, or Network requirements. Our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient external-memory access in distributed environments instead of fitting sub graphs into memory. It works by performing efficient orientation and load-balancing steps, and replicating graphs across machines by using an extended version of Hu et al.'s Massive Graph Triangulation algorithm. PDTL suits a variety of computational environments, from single-core machines to high-end clusters, and computes the exact triangle count on graphs of over 6B edges and 1B vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the state-of-the-art systems Power Graph, OPT, and PATRIC by 2x to 4x. Our approach thus highlights the importance of I/O in a distributed environment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信