{"title":"INFIN: An Efficient Algorithm for Fast Mining Frequent Itemsets","authors":"Shaopeng Wang, Yufei Wang, Chunkai Feng, ChaoYu Niu","doi":"10.1109/PRML52754.2021.9520736","DOIUrl":null,"url":null,"abstract":"The negFIN is the current state-of-the art algorithm for frequent itemsets mining. It employs a novel BMC (bitmap code) encoding model for nodes in a prefix tree based on the bitmap representation of sets. The encoding of each node is a binary number of which bit number is the number of frequent items, and is stored in the form of decimal integer number. The key operations of negFIN are all performed based on the bitwise operation of the encoding. The main problem of BMC is that the maximal bit number of the data type which is used to store the decimal integer number in current general compiling systems is 64, so if the number of frequent items exceeds 64, the encoding cannot work effectively. In this work, we propose B-BMC (block bitmap code) encoding model, a more efficient encoding model. The B-BMC is a dividing of BMC based on the block size in essential. For facilitating the work of B-BMC, the B-BMC tree and TNC(terminal node code) table are devised as an alternative to the BMC tree of negFIN. Based on these two structures, we present an efficient algorithm called INFIN (improved negFIN) to mining frequent itemsets. Our experiments illustrate that the B-BMC can overcome the drawback of BMC, and the INFIN is the most efficient one in time and space when the block size takes value 64 on condition that the number of frequent items exceeds 64.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The negFIN is the current state-of-the art algorithm for frequent itemsets mining. It employs a novel BMC (bitmap code) encoding model for nodes in a prefix tree based on the bitmap representation of sets. The encoding of each node is a binary number of which bit number is the number of frequent items, and is stored in the form of decimal integer number. The key operations of negFIN are all performed based on the bitwise operation of the encoding. The main problem of BMC is that the maximal bit number of the data type which is used to store the decimal integer number in current general compiling systems is 64, so if the number of frequent items exceeds 64, the encoding cannot work effectively. In this work, we propose B-BMC (block bitmap code) encoding model, a more efficient encoding model. The B-BMC is a dividing of BMC based on the block size in essential. For facilitating the work of B-BMC, the B-BMC tree and TNC(terminal node code) table are devised as an alternative to the BMC tree of negFIN. Based on these two structures, we present an efficient algorithm called INFIN (improved negFIN) to mining frequent itemsets. Our experiments illustrate that the B-BMC can overcome the drawback of BMC, and the INFIN is the most efficient one in time and space when the block size takes value 64 on condition that the number of frequent items exceeds 64.