关联规则挖掘算法有多好?

Vikram Pudi, J. Haritsa
{"title":"关联规则挖掘算法有多好?","authors":"Vikram Pudi, J. Haritsa","doi":"10.1109/ICDE.2002.994730","DOIUrl":null,"url":null,"abstract":"Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an \"Oracle algorithm\" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining process. Clearly, any practical algorithm has to do at least this much work in order to generate mining rules. While the notion of the Oracle is conceptually simple, its construction is not equally straightforward. In particular, it is critically dependent on the choice of data structures and database organizations used during the counting process. We present a carefully engineered implementation of Oracle that makes the best choices for these design parameters at each stage of the counting process. We also present anew mining algorithm, called ARMOR (Association Rule Mining based on ORacle), whose structure is derived by making minimal changes to Oracle, and is guaranteed to complete in two passes over the database. This is in marked contrast to the earlier approaches which designed new algorithms by trying to address the limitations of previous online algorithms. Although ARMOR is derived from Oracle, it shares the positive features of a variety of previous algorithms such as PARTITION, CARMA, AS-CPA, VIPER and DELTA. Our empirical study shows that ARMOR consistently performs within a factor of two of Oracle, over both real and synthetic databases.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"How good are association-rule mining algorithms?\",\"authors\":\"Vikram Pudi, J. Haritsa\",\"doi\":\"10.1109/ICDE.2002.994730\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an \\\"Oracle algorithm\\\" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining process. Clearly, any practical algorithm has to do at least this much work in order to generate mining rules. While the notion of the Oracle is conceptually simple, its construction is not equally straightforward. In particular, it is critically dependent on the choice of data structures and database organizations used during the counting process. We present a carefully engineered implementation of Oracle that makes the best choices for these design parameters at each stage of the counting process. We also present anew mining algorithm, called ARMOR (Association Rule Mining based on ORacle), whose structure is derived by making minimal changes to Oracle, and is guaranteed to complete in two passes over the database. This is in marked contrast to the earlier approaches which designed new algorithms by trying to address the limitations of previous online algorithms. Although ARMOR is derived from Oracle, it shares the positive features of a variety of previous algorithms such as PARTITION, CARMA, AS-CPA, VIPER and DELTA. Our empirical study shows that ARMOR consistently performs within a factor of two of Oracle, over both real and synthetic databases.\",\"PeriodicalId\":191529,\"journal\":{\"name\":\"Proceedings 18th International Conference on Data Engineering\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 18th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2002.994730\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 18th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2002.994730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

解决了与当前关联规则挖掘算法相比,还有多少空间用于性能改进的问题。我们的方法是将它们的性能与“Oracle算法”进行比较,该算法提前知道数据库中所有频繁项集的身份,并且只需要在一次数据库扫描中收集这些项集的实际支持,就可以完成挖掘过程。显然,为了生成挖掘规则,任何实用的算法都至少要做这么多的工作。虽然Oracle的概念在概念上很简单,但它的构造并不同样简单。特别是,它严重依赖于计数过程中使用的数据结构和数据库组织的选择。我们提出了一个精心设计的Oracle实现,在计数过程的每个阶段为这些设计参数做出最佳选择。我们还提出了一种新的挖掘算法,称为ARMOR(基于ORacle的关联规则挖掘),其结构是通过对ORacle进行最小的更改而获得的,并保证在两次数据库传递中完成。这与早期的方法形成鲜明对比,这些方法通过尝试解决以前在线算法的局限性来设计新算法。虽然ARMOR源自Oracle,但它分享了先前各种算法的积极特征,如PARTITION, CARMA, as - cpa, VIPER和DELTA。我们的实证研究表明,在真实数据库和合成数据库上,ARMOR的性能始终保持在Oracle的两倍之内。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
How good are association-rule mining algorithms?
Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an "Oracle algorithm" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining process. Clearly, any practical algorithm has to do at least this much work in order to generate mining rules. While the notion of the Oracle is conceptually simple, its construction is not equally straightforward. In particular, it is critically dependent on the choice of data structures and database organizations used during the counting process. We present a carefully engineered implementation of Oracle that makes the best choices for these design parameters at each stage of the counting process. We also present anew mining algorithm, called ARMOR (Association Rule Mining based on ORacle), whose structure is derived by making minimal changes to Oracle, and is guaranteed to complete in two passes over the database. This is in marked contrast to the earlier approaches which designed new algorithms by trying to address the limitations of previous online algorithms. Although ARMOR is derived from Oracle, it shares the positive features of a variety of previous algorithms such as PARTITION, CARMA, AS-CPA, VIPER and DELTA. Our empirical study shows that ARMOR consistently performs within a factor of two of Oracle, over both real and synthetic databases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信