Optimization and realization of parallel frequent item set mining algorithm

2016 International Conference on Audio, Language and Image Processing (ICALIP) Pub Date : 2016-07-01 DOI:10.1109/ICALIP.2016.7846585

Ling Yuan, Dan Li, Yuzhong Chen

{"title":"Optimization and realization of parallel frequent item set mining algorithm","authors":"Ling Yuan, Dan Li, Yuzhong Chen","doi":"10.1109/ICALIP.2016.7846585","DOIUrl":null,"url":null,"abstract":"Associative data mining is the research hotspot in the field of big data, and frequent item sets mining is an important step in the analysis of associative data. This paper focuses on analyzing the frequent item sets mining algorithm based on Apriori parallel algorithm. The paper has found two shortages of Apriori parallel algorithm: one is that the key value pair are too many, another is that in the combiner stage, it occupies two much memory. Therefore, we propose an optimized algorithm. In the optimization algorithm, candidate item sets and local count information are saved in memory, greatly reducing the number of generated keys. Meanwhile, in the short length frequent item sets mining, the method of reducing the number of scanning transaction data without generating candidate item sets can improve the algorithm efficiency. We do the experiments in the Hadoop platform to testify the performance of the proposed optimized algorithm. The experiments demonstrate that the time and I/O of the optimized algorithm have been improved greatly, compared with the non-optimized algorithm.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALIP.2016.7846585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Associative data mining is the research hotspot in the field of big data, and frequent item sets mining is an important step in the analysis of associative data. This paper focuses on analyzing the frequent item sets mining algorithm based on Apriori parallel algorithm. The paper has found two shortages of Apriori parallel algorithm: one is that the key value pair are too many, another is that in the combiner stage, it occupies two much memory. Therefore, we propose an optimized algorithm. In the optimization algorithm, candidate item sets and local count information are saved in memory, greatly reducing the number of generated keys. Meanwhile, in the short length frequent item sets mining, the method of reducing the number of scanning transaction data without generating candidate item sets can improve the algorithm efficiency. We do the experiments in the Hadoop platform to testify the performance of the proposed optimized algorithm. The experiments demonstrate that the time and I/O of the optimized algorithm have been improved greatly, compared with the non-optimized algorithm.

查看原文本刊更多论文

并行频繁项集挖掘算法的优化与实现

关联数据挖掘是大数据领域的研究热点，频繁项集挖掘是关联数据分析的重要步骤。本文重点分析了基于Apriori并行算法的频繁项集挖掘算法。本文发现了Apriori并行算法的两个不足之处:一是键值对过多，二是在合并阶段占用了过多的内存。因此，我们提出了一种优化算法。在优化算法中，候选项集和局部计数信息被保存在内存中，大大减少了生成键的数量。同时，在短长度频繁项集挖掘中，减少扫描事务数据数量而不产生候选项集的方法可以提高算法效率。在Hadoop平台上进行了实验，验证了所提出的优化算法的性能。实验表明，与未优化的算法相比，优化后的算法在时间和I/O上都有了很大的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 International Conference on Audio, Language and Image Processing (ICALIP)

自引率

0.00%

发文量