探索云计算中基于小样本的大数据预处理的最大频繁项集

Gaochao Xu, Yan Ding, Chunyi Wu, Yunan Zhai, Jia Zhao
{"title":"探索云计算中基于小样本的大数据预处理的最大频繁项集","authors":"Gaochao Xu, Yan Ding, Chunyi Wu, Yunan Zhai, Jia Zhao","doi":"10.1109/ICUMT.2016.7765363","DOIUrl":null,"url":null,"abstract":"The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.","PeriodicalId":174688,"journal":{"name":"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Explore maximal frequent itemsets for big data pre-processing based on small sample in cloud computing\",\"authors\":\"Gaochao Xu, Yan Ding, Chunyi Wu, Yunan Zhai, Jia Zhao\",\"doi\":\"10.1109/ICUMT.2016.7765363\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.\",\"PeriodicalId\":174688,\"journal\":{\"name\":\"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICUMT.2016.7765363\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICUMT.2016.7765363","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

基于云计算的大数据处理中的数据挖掘问题已经成为一个研究热点。一般来说,以往的工作大多是通过现有的挖掘方法直接对数据进行分析,这可能会导致计算冗余、时间复杂度高、存储空间大等问题。基于这一论点,提出了一种新的启发式方法,称为PASS(基于小样本的预处理),用于寻找由大数据预处理中最频繁的事务组成的小样本。PASS利用云计算可以解决分布式环境下数据挖掘的瓶颈,直接在事务数据库上操作,将所有事务按照不同的维度进行分组。通过使用位图排序,可以从每个事务集中筛选出最频繁的事务。最后,通过对每个事务集的所有事务选择进行聚合,得到最佳事务集。实验结果表明,该方法显著避免了join操作产生大量候选集,加速了最大频繁项集挖掘,节约了存储空间,同时提高了资源利用率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Explore maximal frequent itemsets for big data pre-processing based on small sample in cloud computing
The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信