INFIN: An Efficient Algorithm for Fast Mining Frequent Itemsets

Shaopeng Wang, Yufei Wang, Chunkai Feng, ChaoYu Niu
{"title":"INFIN: An Efficient Algorithm for Fast Mining Frequent Itemsets","authors":"Shaopeng Wang, Yufei Wang, Chunkai Feng, ChaoYu Niu","doi":"10.1109/PRML52754.2021.9520736","DOIUrl":null,"url":null,"abstract":"The negFIN is the current state-of-the art algorithm for frequent itemsets mining. It employs a novel BMC (bitmap code) encoding model for nodes in a prefix tree based on the bitmap representation of sets. The encoding of each node is a binary number of which bit number is the number of frequent items, and is stored in the form of decimal integer number. The key operations of negFIN are all performed based on the bitwise operation of the encoding. The main problem of BMC is that the maximal bit number of the data type which is used to store the decimal integer number in current general compiling systems is 64, so if the number of frequent items exceeds 64, the encoding cannot work effectively. In this work, we propose B-BMC (block bitmap code) encoding model, a more efficient encoding model. The B-BMC is a dividing of BMC based on the block size in essential. For facilitating the work of B-BMC, the B-BMC tree and TNC(terminal node code) table are devised as an alternative to the BMC tree of negFIN. Based on these two structures, we present an efficient algorithm called INFIN (improved negFIN) to mining frequent itemsets. Our experiments illustrate that the B-BMC can overcome the drawback of BMC, and the INFIN is the most efficient one in time and space when the block size takes value 64 on condition that the number of frequent items exceeds 64.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The negFIN is the current state-of-the art algorithm for frequent itemsets mining. It employs a novel BMC (bitmap code) encoding model for nodes in a prefix tree based on the bitmap representation of sets. The encoding of each node is a binary number of which bit number is the number of frequent items, and is stored in the form of decimal integer number. The key operations of negFIN are all performed based on the bitwise operation of the encoding. The main problem of BMC is that the maximal bit number of the data type which is used to store the decimal integer number in current general compiling systems is 64, so if the number of frequent items exceeds 64, the encoding cannot work effectively. In this work, we propose B-BMC (block bitmap code) encoding model, a more efficient encoding model. The B-BMC is a dividing of BMC based on the block size in essential. For facilitating the work of B-BMC, the B-BMC tree and TNC(terminal node code) table are devised as an alternative to the BMC tree of negFIN. Based on these two structures, we present an efficient algorithm called INFIN (improved negFIN) to mining frequent itemsets. Our experiments illustrate that the B-BMC can overcome the drawback of BMC, and the INFIN is the most efficient one in time and space when the block size takes value 64 on condition that the number of frequent items exceeds 64.
一种快速挖掘频繁项集的有效算法
negFIN是当前最先进的频繁项集挖掘算法。基于集合的位图表示,对前缀树中的节点采用了一种新颖的BMC (bitmap code)编码模型。每个节点的编码是一个二进制数,其中位数是频繁项的个数,并以十进制整数的形式存储。negFIN的键操作都是基于编码的位操作来执行的。BMC的主要问题是目前一般编译系统中用于存储十进制整数的数据类型的最大位数为64位,因此如果频繁项的数量超过64位,则无法有效地进行编码。在这项工作中,我们提出了B-BMC(块位图码)编码模型,这是一种更有效的编码模型。B-BMC本质上是一种基于块大小的BMC划分。为了方便B-BMC的工作,设计了B-BMC树和TNC(终端节点代码)表,作为negFIN的BMC树的替代方案。在这两种结构的基础上,我们提出了一种高效的挖掘频繁项集的算法,称为INFIN (improved negFIN)。实验表明,B-BMC可以克服BMC的缺点,当块大小为64且频繁项数超过64时,INFIN在时间和空间上是最有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信