Optimization and realization of parallel frequent item set mining algorithm

Ling Yuan, Dan Li, Yuzhong Chen
{"title":"Optimization and realization of parallel frequent item set mining algorithm","authors":"Ling Yuan, Dan Li, Yuzhong Chen","doi":"10.1109/ICALIP.2016.7846585","DOIUrl":null,"url":null,"abstract":"Associative data mining is the research hotspot in the field of big data, and frequent item sets mining is an important step in the analysis of associative data. This paper focuses on analyzing the frequent item sets mining algorithm based on Apriori parallel algorithm. The paper has found two shortages of Apriori parallel algorithm: one is that the key value pair are too many, another is that in the combiner stage, it occupies two much memory. Therefore, we propose an optimized algorithm. In the optimization algorithm, candidate item sets and local count information are saved in memory, greatly reducing the number of generated keys. Meanwhile, in the short length frequent item sets mining, the method of reducing the number of scanning transaction data without generating candidate item sets can improve the algorithm efficiency. We do the experiments in the Hadoop platform to testify the performance of the proposed optimized algorithm. The experiments demonstrate that the time and I/O of the optimized algorithm have been improved greatly, compared with the non-optimized algorithm.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALIP.2016.7846585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Associative data mining is the research hotspot in the field of big data, and frequent item sets mining is an important step in the analysis of associative data. This paper focuses on analyzing the frequent item sets mining algorithm based on Apriori parallel algorithm. The paper has found two shortages of Apriori parallel algorithm: one is that the key value pair are too many, another is that in the combiner stage, it occupies two much memory. Therefore, we propose an optimized algorithm. In the optimization algorithm, candidate item sets and local count information are saved in memory, greatly reducing the number of generated keys. Meanwhile, in the short length frequent item sets mining, the method of reducing the number of scanning transaction data without generating candidate item sets can improve the algorithm efficiency. We do the experiments in the Hadoop platform to testify the performance of the proposed optimized algorithm. The experiments demonstrate that the time and I/O of the optimized algorithm have been improved greatly, compared with the non-optimized algorithm.
并行频繁项集挖掘算法的优化与实现
关联数据挖掘是大数据领域的研究热点,频繁项集挖掘是关联数据分析的重要步骤。本文重点分析了基于Apriori并行算法的频繁项集挖掘算法。本文发现了Apriori并行算法的两个不足之处:一是键值对过多,二是在合并阶段占用了过多的内存。因此,我们提出了一种优化算法。在优化算法中,候选项集和局部计数信息被保存在内存中,大大减少了生成键的数量。同时,在短长度频繁项集挖掘中,减少扫描事务数据数量而不产生候选项集的方法可以提高算法效率。在Hadoop平台上进行了实验,验证了所提出的优化算法的性能。实验表明,与未优化的算法相比,优化后的算法在时间和I/O上都有了很大的提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信