{"title":"An Improved MapReduce Algorithm for Mining Closed Frequent Itemsets","authors":"Yaron Gonen, E. Gudes","doi":"10.1109/SWSTE.2016.19","DOIUrl":null,"url":null,"abstract":"Mining closed frequent item sets is a key objective in the field of data mining due to its wide range of applications. Given a database of transactions, the task is to find closed subsets which appear frequently in different transactions. This subject has been studied thoroughly, and many efficient algorithms had been presented, however, most of them were designed for a non-distributed setting. The exponential growth of data in current times forces storing it in a distributed setting, meaning that most algorithms no longer apply. MapReduce is an acclaimed programming paradigm for processing large-scale, distributed data. In this paper we present an efficient algorithm for mining closed frequent item sets using the MapReduce paradigm. In addition to its novelty of running in a distributed setting, it also makes the duplication elimination step - a common step to all existing algorithms - redundant.","PeriodicalId":118525,"journal":{"name":"2016 IEEE International Conference on Software Science, Technology and Engineering (SWSTE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Software Science, Technology and Engineering (SWSTE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SWSTE.2016.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Mining closed frequent item sets is a key objective in the field of data mining due to its wide range of applications. Given a database of transactions, the task is to find closed subsets which appear frequently in different transactions. This subject has been studied thoroughly, and many efficient algorithms had been presented, however, most of them were designed for a non-distributed setting. The exponential growth of data in current times forces storing it in a distributed setting, meaning that most algorithms no longer apply. MapReduce is an acclaimed programming paradigm for processing large-scale, distributed data. In this paper we present an efficient algorithm for mining closed frequent item sets using the MapReduce paradigm. In addition to its novelty of running in a distributed setting, it also makes the duplication elimination step - a common step to all existing algorithms - redundant.