BIDE: efficient mining of frequent closed sequences

Jianyong Wang, Jiawei Han
{"title":"BIDE: efficient mining of frequent closed sequences","authors":"Jianyong Wang, Jiawei Han","doi":"10.1109/ICDE.2004.1319986","DOIUrl":null,"url":null,"abstract":"Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. We present, BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. We adopt a novel sequence closure checking scheme called bidirectional extension, and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method and the Scan-Skip optimization technique. A thorough performance study with both sparse and dense real-life data sets has demonstrated that BIDE significantly outperforms the previous algorithms: it consumes order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"732","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 20th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2004.1319986","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 732

Abstract

Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. We present, BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. We adopt a novel sequence closure checking scheme called bidirectional extension, and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method and the Scan-Skip optimization technique. A thorough performance study with both sparse and dense real-life data sets has demonstrated that BIDE significantly outperforms the previous algorithms: it consumes order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.
BIDE:高效挖掘频繁闭合序列
以往的研究表明,频繁模式挖掘算法不应该挖掘所有的频繁模式,而应该只挖掘封闭模式,因为后者不仅可以使结果集更紧凑完整,而且可以提高效率。然而,以前开发的大多数封闭模式挖掘算法都是在候选维护和测试范式下工作的,当支持阈值较低或模式变长时,这种模式在运行时和空间使用方面都是非常昂贵的。提出了一种不需要候选维护的频繁闭序列挖掘算法。我们采用了一种新的序列闭包检查方案——双向扩展,并通过使用BackScan修剪方法和Scan-Skip优化技术对搜索空间进行了更深入的修剪。对稀疏和密集的实际数据集进行的全面性能研究表明,ide的性能明显优于以前的算法:它消耗的内存少了几个数量级,而且速度可以快一个数量级以上。在数据库大小方面,它也是线性可扩展的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信