{"title":"Optimization and realization of parallel frequent item set mining algorithm","authors":"Ling Yuan, Dan Li, Yuzhong Chen","doi":"10.1109/ICALIP.2016.7846585","DOIUrl":null,"url":null,"abstract":"Associative data mining is the research hotspot in the field of big data, and frequent item sets mining is an important step in the analysis of associative data. This paper focuses on analyzing the frequent item sets mining algorithm based on Apriori parallel algorithm. The paper has found two shortages of Apriori parallel algorithm: one is that the key value pair are too many, another is that in the combiner stage, it occupies two much memory. Therefore, we propose an optimized algorithm. In the optimization algorithm, candidate item sets and local count information are saved in memory, greatly reducing the number of generated keys. Meanwhile, in the short length frequent item sets mining, the method of reducing the number of scanning transaction data without generating candidate item sets can improve the algorithm efficiency. We do the experiments in the Hadoop platform to testify the performance of the proposed optimized algorithm. The experiments demonstrate that the time and I/O of the optimized algorithm have been improved greatly, compared with the non-optimized algorithm.","PeriodicalId":184170,"journal":{"name":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Audio, Language and Image Processing (ICALIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALIP.2016.7846585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Associative data mining is the research hotspot in the field of big data, and frequent item sets mining is an important step in the analysis of associative data. This paper focuses on analyzing the frequent item sets mining algorithm based on Apriori parallel algorithm. The paper has found two shortages of Apriori parallel algorithm: one is that the key value pair are too many, another is that in the combiner stage, it occupies two much memory. Therefore, we propose an optimized algorithm. In the optimization algorithm, candidate item sets and local count information are saved in memory, greatly reducing the number of generated keys. Meanwhile, in the short length frequent item sets mining, the method of reducing the number of scanning transaction data without generating candidate item sets can improve the algorithm efficiency. We do the experiments in the Hadoop platform to testify the performance of the proposed optimized algorithm. The experiments demonstrate that the time and I/O of the optimized algorithm have been improved greatly, compared with the non-optimized algorithm.