An Efficient Association Rule Mining Algorithm In Distributed Databases

Wu Jian, L. Ming
{"title":"An Efficient Association Rule Mining Algorithm In Distributed Databases","authors":"Wu Jian, L. Ming","doi":"10.1109/WKDD.2008.33","DOIUrl":null,"url":null,"abstract":"This paper describes the alarm correlation in communication networks based on data mining. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In our study, an efficient algorithm, EDMA, is proposed. It minimizes the number of candidate sets and exchange messages by local and global pruning. In local sites, it runs the application based on the improved algorithm-CMatrix, which is used to calculate local support counts. By numbering the global frequent itemsets generated at the end of k-th iteration from 1 to m, the algorithm codes every candidate (k+l)-itemset into a pair of those number formed as-(x,y) to compress the context transmitted and query corresponding support counts in CMatrix. Our solution also reduces the size of average transactions and datasets that leads to reduction of scan time. The performance study shows that EDMA has superior running efficiency, lower communication cost and stronger scalability than direct application of a sequential algorithm in distributed databases.","PeriodicalId":101656,"journal":{"name":"First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WKDD.2008.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

This paper describes the alarm correlation in communication networks based on data mining. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In our study, an efficient algorithm, EDMA, is proposed. It minimizes the number of candidate sets and exchange messages by local and global pruning. In local sites, it runs the application based on the improved algorithm-CMatrix, which is used to calculate local support counts. By numbering the global frequent itemsets generated at the end of k-th iteration from 1 to m, the algorithm codes every candidate (k+l)-itemset into a pair of those number formed as-(x,y) to compress the context transmitted and query corresponding support counts in CMatrix. Our solution also reduces the size of average transactions and datasets that leads to reduction of scan time. The performance study shows that EDMA has superior running efficiency, lower communication cost and stronger scalability than direct application of a sequential algorithm in distributed databases.
分布式数据库中一种高效的关联规则挖掘算法
本文描述了基于数据挖掘的通信网络报警关联。直接将顺序算法应用到分布式数据库是不有效的,因为它需要大量的通信开销。在我们的研究中,提出了一种有效的算法,EDMA。它通过局部和全局剪枝最小化候选集和交换消息的数量。在本地站点,它基于改进的算法- cmatrix运行应用程序,该算法用于计算本地支持计数。该算法通过对第k次迭代结束时生成的全局频繁项集从1到m编号,将每个候选(k+l)项集编码为-(x,y)的一对编号,压缩传输的上下文,并在CMatrix中查询相应的支持计数。我们的解决方案还减少了平均事务和数据集的大小,从而减少了扫描时间。性能研究表明,与直接在分布式数据库中应用顺序算法相比,EDMA具有更高的运行效率、更低的通信成本和更强的可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信