一种发现频繁模式的快速并行算法

2009 IEEE International Conference on Granular Computing Pub Date : 2009-09-22 DOI:10.1109/GRC.2009.5255089

K. W. Lin, Yu-Chin Luo

{"title":"一种发现频繁模式的快速并行算法","authors":"K. W. Lin, Yu-Chin Luo","doi":"10.1109/GRC.2009.5255089","DOIUrl":null,"url":null,"abstract":"Fast discovery of frequent patterns is the most extensively discussed problem in data mining fields due to its wide applications. As the size of database increases, the computation time and the required memory increase severely. The difficulty of mining large database launched the research of designing parallel and distributed algorithms to solve the problem. Most of the past studies tried to parallelize the computation by dividing the database and distribute the divided database to other nodes for mining. This approach might leak data out and evidently is not suitable to be applied to sensitive domains like health-care. In this paper, we propose a novel data mining algorithm named FD-Mine that is able to efficiently utilize the nodes to discover frequent patterns in cloud computing environments with data privacy preserved. Through empirical evaluations on various simulation conditions, the proposed FD-Mine delivers excellent performance in terms of scalability and execution time.","PeriodicalId":388774,"journal":{"name":"2009 IEEE International Conference on Granular Computing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"A fast parallel algorithm for discovering frequent patterns\",\"authors\":\"K. W. Lin, Yu-Chin Luo\",\"doi\":\"10.1109/GRC.2009.5255089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fast discovery of frequent patterns is the most extensively discussed problem in data mining fields due to its wide applications. As the size of database increases, the computation time and the required memory increase severely. The difficulty of mining large database launched the research of designing parallel and distributed algorithms to solve the problem. Most of the past studies tried to parallelize the computation by dividing the database and distribute the divided database to other nodes for mining. This approach might leak data out and evidently is not suitable to be applied to sensitive domains like health-care. In this paper, we propose a novel data mining algorithm named FD-Mine that is able to efficiently utilize the nodes to discover frequent patterns in cloud computing environments with data privacy preserved. Through empirical evaluations on various simulation conditions, the proposed FD-Mine delivers excellent performance in terms of scalability and execution time.\",\"PeriodicalId\":388774,\"journal\":{\"name\":\"2009 IEEE International Conference on Granular Computing\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Conference on Granular Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GRC.2009.5255089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2009.5255089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

频繁模式的快速发现因其广泛的应用而成为数据挖掘领域中讨论最为广泛的问题。随着数据库规模的增加，计算时间和所需内存也会急剧增加。大型数据库挖掘的困难引发了设计并行和分布式算法来解决这一问题的研究。过去的研究大多是通过划分数据库并将划分的数据库分配到其他节点进行挖掘来实现并行化计算。这种方法可能会泄露数据，显然不适合应用于医疗保健等敏感领域。本文提出了一种新的数据挖掘算法FD-Mine，该算法能够在保护数据隐私的情况下，有效地利用节点发现云计算环境中的频繁模式。通过对各种仿真条件的经验评估，所提出的FD-Mine在可扩展性和执行时间方面具有优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A fast parallel algorithm for discovering frequent patterns

Fast discovery of frequent patterns is the most extensively discussed problem in data mining fields due to its wide applications. As the size of database increases, the computation time and the required memory increase severely. The difficulty of mining large database launched the research of designing parallel and distributed algorithms to solve the problem. Most of the past studies tried to parallelize the computation by dividing the database and distribute the divided database to other nodes for mining. This approach might leak data out and evidently is not suitable to be applied to sensitive domains like health-care. In this paper, we propose a novel data mining algorithm named FD-Mine that is able to efficiently utilize the nodes to discover frequent patterns in cloud computing environments with data privacy preserved. Through empirical evaluations on various simulation conditions, the proposed FD-Mine delivers excellent performance in terms of scalability and execution time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE International Conference on Granular Computing

自引率

0.00%

发文量