{"title":"Large Scale Frequent Pattern Mining Using MPI One-Sided Model","authors":"Abhinav Vishnu, Khushbu Agarwal","doi":"10.1109/CLUSTER.2015.30","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a work-stealing runtime - Library for Work Stealing (LibWS) - using MPI one-sided model for designing scalable FP-Growth - defacto frequent pattern mining algorithm - on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art θ(p) to θ(f + p/f), for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. An experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (91% efficiency for Power-law and 93% for Poisson). The proposed distributed FPTree merging algorithm provides 38x communication speedup on 4096 cores.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"11218 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
In this paper, we propose a work-stealing runtime - Library for Work Stealing (LibWS) - using MPI one-sided model for designing scalable FP-Growth - defacto frequent pattern mining algorithm - on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art θ(p) to θ(f + p/f), for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. An experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (91% efficiency for Power-law and 93% for Poisson). The proposed distributed FPTree merging algorithm provides 38x communication speedup on 4096 cores.