An Efficient Association Rule Mining Algorithm In Distributed Databases

First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008) Pub Date : 2008-01-23 DOI:10.1109/WKDD.2008.33

Wu Jian, L. Ming

{"title":"An Efficient Association Rule Mining Algorithm In Distributed Databases","authors":"Wu Jian, L. Ming","doi":"10.1109/WKDD.2008.33","DOIUrl":null,"url":null,"abstract":"This paper describes the alarm correlation in communication networks based on data mining. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In our study, an efficient algorithm, EDMA, is proposed. It minimizes the number of candidate sets and exchange messages by local and global pruning. In local sites, it runs the application based on the improved algorithm-CMatrix, which is used to calculate local support counts. By numbering the global frequent itemsets generated at the end of k-th iteration from 1 to m, the algorithm codes every candidate (k+l)-itemset into a pair of those number formed as-(x,y) to compress the context transmitted and query corresponding support counts in CMatrix. Our solution also reduces the size of average transactions and datasets that leads to reduction of scan time. The performance study shows that EDMA has superior running efficiency, lower communication cost and stronger scalability than direct application of a sequential algorithm in distributed databases.","PeriodicalId":101656,"journal":{"name":"First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WKDD.2008.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

This paper describes the alarm correlation in communication networks based on data mining. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In our study, an efficient algorithm, EDMA, is proposed. It minimizes the number of candidate sets and exchange messages by local and global pruning. In local sites, it runs the application based on the improved algorithm-CMatrix, which is used to calculate local support counts. By numbering the global frequent itemsets generated at the end of k-th iteration from 1 to m, the algorithm codes every candidate (k+l)-itemset into a pair of those number formed as-(x,y) to compress the context transmitted and query corresponding support counts in CMatrix. Our solution also reduces the size of average transactions and datasets that leads to reduction of scan time. The performance study shows that EDMA has superior running efficiency, lower communication cost and stronger scalability than direct application of a sequential algorithm in distributed databases.

查看原文本刊更多论文

分布式数据库中一种高效的关联规则挖掘算法

本文描述了基于数据挖掘的通信网络报警关联。直接将顺序算法应用到分布式数据库是不有效的，因为它需要大量的通信开销。在我们的研究中，提出了一种有效的算法，EDMA。它通过局部和全局剪枝最小化候选集和交换消息的数量。在本地站点，它基于改进的算法- cmatrix运行应用程序，该算法用于计算本地支持计数。该算法通过对第k次迭代结束时生成的全局频繁项集从1到m编号，将每个候选(k+l)项集编码为-(x,y)的一对编号，压缩传输的上下文，并在CMatrix中查询相应的支持计数。我们的解决方案还减少了平均事务和数据集的大小，从而减少了扫描时间。性能研究表明，与直接在分布式数据库中应用顺序算法相比，EDMA具有更高的运行效率、更低的通信成本和更强的可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008)

自引率

0.00%

发文量