Towards Efficient Closed Infrequent Itemset Mining Using Bi-Directional Traversing

Yifeng Lu, T. Seidl
{"title":"Towards Efficient Closed Infrequent Itemset Mining Using Bi-Directional Traversing","authors":"Yifeng Lu, T. Seidl","doi":"10.1109/DSAA.2018.00024","DOIUrl":null,"url":null,"abstract":"In this work, we investigate the opposite question of frequent itemset mining: what patterns occurred less than a given minimum support in a transactional database? This question, known as infrequent itemset mining, is important in fields such as medical science, security, finance and scientific research. Frequent patterns represent expected or obvious information while infrequent patterns are those unexpected behaviors and are more interesting in some applications. For example, health-care needs to identify sporadic but lethal crossover effects. Security agents have to uncover infrequent associative fraud indicators. Existing infrequent itemset mining approaches are time-consuming. Furthermore, extracting all infrequent patterns might suffer from the redundant problem. In this paper, we study the two factors that affect the performance of itemset mining tasks. The concept of closed itemset is applied for infrequent patterns to reduce the number of returned patterns. An efficient closed infrequent itemset mining approach is proposed which combines both bottom-up and top-down traversing strategies. Extensive experimental results show that a simple algorithm based on our framework, without using advanced data structure or pruning techniques, can still be significantly more efficient when compared with other approaches.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2018.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this work, we investigate the opposite question of frequent itemset mining: what patterns occurred less than a given minimum support in a transactional database? This question, known as infrequent itemset mining, is important in fields such as medical science, security, finance and scientific research. Frequent patterns represent expected or obvious information while infrequent patterns are those unexpected behaviors and are more interesting in some applications. For example, health-care needs to identify sporadic but lethal crossover effects. Security agents have to uncover infrequent associative fraud indicators. Existing infrequent itemset mining approaches are time-consuming. Furthermore, extracting all infrequent patterns might suffer from the redundant problem. In this paper, we study the two factors that affect the performance of itemset mining tasks. The concept of closed itemset is applied for infrequent patterns to reduce the number of returned patterns. An efficient closed infrequent itemset mining approach is proposed which combines both bottom-up and top-down traversing strategies. Extensive experimental results show that a simple algorithm based on our framework, without using advanced data structure or pruning techniques, can still be significantly more efficient when compared with other approaches.
基于双向遍历的高效封闭非频繁项集挖掘
在这项工作中,我们研究了频繁项集挖掘的相反问题:在事务数据库中,少于给定最小支持的模式发生了什么?这个问题被称为罕见项集挖掘,在医学、安全、金融和科学研究等领域都很重要。频繁模式表示预期的或明显的信息,而不频繁模式是那些意外的行为,在某些应用程序中更有趣。例如,卫生保健需要确定零星但致命的交叉影响。安全人员必须发现不常见的关联欺诈指标。现有的非频繁项集挖掘方法非常耗时。此外,提取所有不频繁的模式可能会出现冗余问题。本文研究了影响项集挖掘任务性能的两个因素。封闭项集的概念应用于不频繁的模式,以减少返回模式的数量。提出了一种高效的自底向上和自顶向下两种遍历策略相结合的封闭非频繁项集挖掘方法。大量的实验结果表明,基于我们的框架的简单算法,在不使用高级数据结构或修剪技术的情况下,与其他方法相比,仍然可以显着提高效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信