{"title":"Parallel mining frequent patterns over big transactional data in extended mapreduce","authors":"Hui Chen, T. Lin, Zhibing Zhang, Jie Zhong","doi":"10.1109/GrC.2013.6740378","DOIUrl":null,"url":null,"abstract":"In big data era, data size has raised from TB-level to PB-level. Traditional algorithm can not satisfy the needs of big data computing. This paper design a parallel algorithm for mining frequent pattern over big transactional data based on an extended MapReduce Frame. In which, the mass data file is firstly split into many data subfiles, the patterns in each subfile can be quickly located based on bitmap computation by scanning the data only once. And the computing results of all subfiles are merged for mining the frequent patterns in the whole big data. In order to improve the performance of the proposed method, the insignificant patterns are pruned by a statistic analysis method when the data subfiles are processed. The experimental results show that the method is efficient, strong in scalability, and can be used to efficiently mine frequent patterns in big data.","PeriodicalId":415445,"journal":{"name":"2013 IEEE International Conference on Granular Computing (GrC)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Granular Computing (GrC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GrC.2013.6740378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
In big data era, data size has raised from TB-level to PB-level. Traditional algorithm can not satisfy the needs of big data computing. This paper design a parallel algorithm for mining frequent pattern over big transactional data based on an extended MapReduce Frame. In which, the mass data file is firstly split into many data subfiles, the patterns in each subfile can be quickly located based on bitmap computation by scanning the data only once. And the computing results of all subfiles are merged for mining the frequent patterns in the whole big data. In order to improve the performance of the proposed method, the insignificant patterns are pruned by a statistic analysis method when the data subfiles are processed. The experimental results show that the method is efficient, strong in scalability, and can be used to efficiently mine frequent patterns in big data.