基于MPI单侧模型的大规模频繁模式挖掘

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI:10.1109/CLUSTER.2015.30

Abhinav Vishnu, Khushbu Agarwal

{"title":"基于MPI单侧模型的大规模频繁模式挖掘","authors":"Abhinav Vishnu, Khushbu Agarwal","doi":"10.1109/CLUSTER.2015.30","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a work-stealing runtime - Library for Work Stealing (LibWS) - using MPI one-sided model for designing scalable FP-Growth - defacto frequent pattern mining algorithm - on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art θ(p) to θ(f + p/f), for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. An experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (91% efficiency for Power-law and 93% for Poisson). The proposed distributed FPTree merging algorithm provides 38x communication speedup on 4096 cores.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"11218 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Large Scale Frequent Pattern Mining Using MPI One-Sided Model\",\"authors\":\"Abhinav Vishnu, Khushbu Agarwal\",\"doi\":\"10.1109/CLUSTER.2015.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a work-stealing runtime - Library for Work Stealing (LibWS) - using MPI one-sided model for designing scalable FP-Growth - defacto frequent pattern mining algorithm - on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art θ(p) to θ(f + p/f), for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. An experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (91% efficiency for Power-law and 93% for Poisson). The proposed distributed FPTree merging algorithm provides 38x communication speedup on 4096 cores.\",\"PeriodicalId\":187042,\"journal\":{\"name\":\"2015 IEEE International Conference on Cluster Computing\",\"volume\":\"11218 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTER.2015.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

在本文中，我们提出了一个工作窃取运行时库-工作窃取库(LibWS) -使用MPI单侧模型设计可扩展的FP-Growth -事实上的频繁模式挖掘算法-在大规模系统上。LibWS为各种数据分布上的负载平衡提供了本地高效和高度可扩展的工作窃取技术。我们还提出了一种新的fp增长数据交换阶段的通信算法，该算法将p个过程和f个频繁属性id的通信复杂度从最先进的θ(p)降低到θ(f + p/f)。FP-Growth是使用LibWS实现的，并在几个工作发行版和支持计数上进行了评估。在InfiniBand集群上使用4096进程对LibWS上的FP-Growth进行的实验评估表明，对于几种功分布(Power-law效率为91%，泊松效率为93%)，FP-Growth具有出色的效率。提出的分布式FPTree合并算法在4096核上提供38倍的通信加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large Scale Frequent Pattern Mining Using MPI One-Sided Model

In this paper, we propose a work-stealing runtime - Library for Work Stealing (LibWS) - using MPI one-sided model for designing scalable FP-Growth - defacto frequent pattern mining algorithm - on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art θ(p) to θ(f + p/f), for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. An experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (91% efficiency for Power-law and 93% for Poisson). The proposed distributed FPTree merging algorithm provides 38x communication speedup on 4096 cores.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量