{"title":"A fast parallel algorithm for discovering frequent patterns","authors":"K. W. Lin, Yu-Chin Luo","doi":"10.1109/GRC.2009.5255089","DOIUrl":null,"url":null,"abstract":"Fast discovery of frequent patterns is the most extensively discussed problem in data mining fields due to its wide applications. As the size of database increases, the computation time and the required memory increase severely. The difficulty of mining large database launched the research of designing parallel and distributed algorithms to solve the problem. Most of the past studies tried to parallelize the computation by dividing the database and distribute the divided database to other nodes for mining. This approach might leak data out and evidently is not suitable to be applied to sensitive domains like health-care. In this paper, we propose a novel data mining algorithm named FD-Mine that is able to efficiently utilize the nodes to discover frequent patterns in cloud computing environments with data privacy preserved. Through empirical evaluations on various simulation conditions, the proposed FD-Mine delivers excellent performance in terms of scalability and execution time.","PeriodicalId":388774,"journal":{"name":"2009 IEEE International Conference on Granular Computing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2009.5255089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Fast discovery of frequent patterns is the most extensively discussed problem in data mining fields due to its wide applications. As the size of database increases, the computation time and the required memory increase severely. The difficulty of mining large database launched the research of designing parallel and distributed algorithms to solve the problem. Most of the past studies tried to parallelize the computation by dividing the database and distribute the divided database to other nodes for mining. This approach might leak data out and evidently is not suitable to be applied to sensitive domains like health-care. In this paper, we propose a novel data mining algorithm named FD-Mine that is able to efficiently utilize the nodes to discover frequent patterns in cloud computing environments with data privacy preserved. Through empirical evaluations on various simulation conditions, the proposed FD-Mine delivers excellent performance in terms of scalability and execution time.