探索云计算中基于小样本的大数据预处理的最大频繁项集

2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT) Pub Date : 2016-10-01 DOI:10.1109/ICUMT.2016.7765363

Gaochao Xu, Yan Ding, Chunyi Wu, Yunan Zhai, Jia Zhao

{"title":"探索云计算中基于小样本的大数据预处理的最大频繁项集","authors":"Gaochao Xu, Yan Ding, Chunyi Wu, Yunan Zhai, Jia Zhao","doi":"10.1109/ICUMT.2016.7765363","DOIUrl":null,"url":null,"abstract":"The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.","PeriodicalId":174688,"journal":{"name":"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Explore maximal frequent itemsets for big data pre-processing based on small sample in cloud computing\",\"authors\":\"Gaochao Xu, Yan Ding, Chunyi Wu, Yunan Zhai, Jia Zhao\",\"doi\":\"10.1109/ICUMT.2016.7765363\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.\",\"PeriodicalId\":174688,\"journal\":{\"name\":\"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICUMT.2016.7765363\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICUMT.2016.7765363","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

基于云计算的大数据处理中的数据挖掘问题已经成为一个研究热点。一般来说，以往的工作大多是通过现有的挖掘方法直接对数据进行分析，这可能会导致计算冗余、时间复杂度高、存储空间大等问题。基于这一论点，提出了一种新的启发式方法，称为PASS(基于小样本的预处理)，用于寻找由大数据预处理中最频繁的事务组成的小样本。PASS利用云计算可以解决分布式环境下数据挖掘的瓶颈，直接在事务数据库上操作，将所有事务按照不同的维度进行分组。通过使用位图排序，可以从每个事务集中筛选出最频繁的事务。最后，通过对每个事务集的所有事务选择进行聚合，得到最佳事务集。实验结果表明，该方法显著避免了join操作产生大量候选集，加速了最大频繁项集挖掘，节约了存储空间，同时提高了资源利用率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Explore maximal frequent itemsets for big data pre-processing based on small sample in cloud computing

The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)

自引率

0.00%

发文量