用于挖掘频繁模式的高效索引结构

Bin Lan, B. Ooi, K. Tan
{"title":"用于挖掘频繁模式的高效索引结构","authors":"Bin Lan, B. Ooi, K. Tan","doi":"10.1109/ICDE.2002.994758","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a variant of the signature file, called bit-sliced bloom-filtered signature file (BBS), as the basis for implementing filter-and-refine strategies for mining frequent patterns. In the filtering step, the candidate patterns are obtained by scanning BBS instead of the database. The resultant candidate set contains a superset of the frequent patterns. In the refinement phase, each algorithm refines the candidate set to prune away the false drops. Based on this indexing structure, we study two filtering (single and dual filter) and two refinement (sequential scan and probe) mechanisms, thus giving rise to four different strategies. We conducted an extensive performance study to study the effectiveness of BBS, and compared the four proposed processing schemes with the traditional a priori algorithm and the recently proposed FP-tree scheme. Our results show that BBS, as a whole, outperforms the a priori strategy. Moreover, one of the schemes that is based on dual filter and probe refinement performs the best in all cases.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"200 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Efficient indexing structures for mining frequent patterns\",\"authors\":\"Bin Lan, B. Ooi, K. Tan\",\"doi\":\"10.1109/ICDE.2002.994758\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a variant of the signature file, called bit-sliced bloom-filtered signature file (BBS), as the basis for implementing filter-and-refine strategies for mining frequent patterns. In the filtering step, the candidate patterns are obtained by scanning BBS instead of the database. The resultant candidate set contains a superset of the frequent patterns. In the refinement phase, each algorithm refines the candidate set to prune away the false drops. Based on this indexing structure, we study two filtering (single and dual filter) and two refinement (sequential scan and probe) mechanisms, thus giving rise to four different strategies. We conducted an extensive performance study to study the effectiveness of BBS, and compared the four proposed processing schemes with the traditional a priori algorithm and the recently proposed FP-tree scheme. Our results show that BBS, as a whole, outperforms the a priori strategy. Moreover, one of the schemes that is based on dual filter and probe refinement performs the best in all cases.\",\"PeriodicalId\":191529,\"journal\":{\"name\":\"Proceedings 18th International Conference on Data Engineering\",\"volume\":\"200 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 18th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2002.994758\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 18th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2002.994758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

在本文中,我们提出了签名文件的一种变体,称为位切片开花过滤签名文件(BBS),作为实现过滤和细化策略以挖掘频繁模式的基础。在过滤步骤中,候选模式是通过扫描BBS而不是数据库来获得的。结果候选集包含频繁模式的超集。在细化阶段,每个算法对候选集进行细化,以去除假滴。基于这种索引结构,我们研究了两种过滤机制(单过滤和双过滤)和两种细化机制(顺序扫描和探测),从而产生了四种不同的策略。我们进行了广泛的性能研究来研究BBS的有效性,并将四种提出的处理方案与传统的先验算法和最近提出的FP-tree方案进行了比较。我们的研究结果表明,BBS作为一个整体,优于先验策略。此外,其中一种基于双滤波器和探测细化的方案在所有情况下都表现最好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Efficient indexing structures for mining frequent patterns
In this paper, we propose a variant of the signature file, called bit-sliced bloom-filtered signature file (BBS), as the basis for implementing filter-and-refine strategies for mining frequent patterns. In the filtering step, the candidate patterns are obtained by scanning BBS instead of the database. The resultant candidate set contains a superset of the frequent patterns. In the refinement phase, each algorithm refines the candidate set to prune away the false drops. Based on this indexing structure, we study two filtering (single and dual filter) and two refinement (sequential scan and probe) mechanisms, thus giving rise to four different strategies. We conducted an extensive performance study to study the effectiveness of BBS, and compared the four proposed processing schemes with the traditional a priori algorithm and the recently proposed FP-tree scheme. Our results show that BBS, as a whole, outperforms the a priori strategy. Moreover, one of the schemes that is based on dual filter and probe refinement performs the best in all cases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信