Gaochao Xu, Yan Ding, Chunyi Wu, Yunan Zhai, Jia Zhao
{"title":"探索云计算中基于小样本的大数据预处理的最大频繁项集","authors":"Gaochao Xu, Yan Ding, Chunyi Wu, Yunan Zhai, Jia Zhao","doi":"10.1109/ICUMT.2016.7765363","DOIUrl":null,"url":null,"abstract":"The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.","PeriodicalId":174688,"journal":{"name":"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Explore maximal frequent itemsets for big data pre-processing based on small sample in cloud computing\",\"authors\":\"Gaochao Xu, Yan Ding, Chunyi Wu, Yunan Zhai, Jia Zhao\",\"doi\":\"10.1109/ICUMT.2016.7765363\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.\",\"PeriodicalId\":174688,\"journal\":{\"name\":\"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICUMT.2016.7765363\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICUMT.2016.7765363","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Explore maximal frequent itemsets for big data pre-processing based on small sample in cloud computing
The data mining issue in big data processing which is based on cloud computing has become a hot research topic. Generally, most of the previous work directly analyzes the data through the existing mining approaches, which may cause problems such as redundant computation, high time complexity, and large storage space. Based on this argument, a novel heuristic approach called PASS (Pre-processing based on Small Sample) has been proposed for finding a small sample composed of the most frequent transactions in big data pre-processing. Taking advantage of the cloud computing which can solve the bottleneck of data mining in distributed environment, PASS directly operates on the transaction database and groups all transactions according to different dimensions. By using the Bitmap-Sort, the most frequent transactions can be screened from each transaction set. Finally, the best-transaction-set is obtained through aggregating all the transaction-elects of each transaction set. The experimental results have shown that PASS significantly avoids producing plenty of candidate sets resulting from join operation, accelerates the maximal frequent itemsets mining, economizes the storage space, and improves the utilization rate of resources simultaneously.