Connections between Mining Frequent Itemsets and Learning Generative Models

S. Laxman, P. Naldurg, Raja Sripada, R. Venkatesan
{"title":"Connections between Mining Frequent Itemsets and Learning Generative Models","authors":"S. Laxman, P. Naldurg, Raja Sripada, R. Venkatesan","doi":"10.1109/ICDM.2007.83","DOIUrl":null,"url":null,"abstract":"Frequent itemsets mining is a popular framework for pattern discovery. In this framework, given a database of customer transactions, the task is to unearth all patterns in the form of sets of items appearing in a sizable number of transactions. We present a class of models called Itemset Generating Models (or IGMs) that can be used to formally connect the process of frequent item- sets discovery with the learning of generative models. IGMs are specified using simple probability mass functions (over the space of transactions), peaked at specific sets of items and uniform everywhere else. Under such a connection, it is possible to rigorously associate higher frequency patterns with generative models that have greater data likelihoods. This enables a generative model-learning interpretation of frequent itemsets mining. More importantly, it facilitates a statistical significance test which prescribes the minimum frequency needed for a pattern to be considered interesting. We illustrate the effectiveness of our analysis through experiments on standard benchmark data sets.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2007.83","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Frequent itemsets mining is a popular framework for pattern discovery. In this framework, given a database of customer transactions, the task is to unearth all patterns in the form of sets of items appearing in a sizable number of transactions. We present a class of models called Itemset Generating Models (or IGMs) that can be used to formally connect the process of frequent item- sets discovery with the learning of generative models. IGMs are specified using simple probability mass functions (over the space of transactions), peaked at specific sets of items and uniform everywhere else. Under such a connection, it is possible to rigorously associate higher frequency patterns with generative models that have greater data likelihoods. This enables a generative model-learning interpretation of frequent itemsets mining. More importantly, it facilitates a statistical significance test which prescribes the minimum frequency needed for a pattern to be considered interesting. We illustrate the effectiveness of our analysis through experiments on standard benchmark data sets.
挖掘频繁项集与学习生成模型之间的联系
频繁项集挖掘是一种流行的模式发现框架。在这个框架中,给定一个客户事务数据库,任务是以大量事务中出现的项目集的形式挖掘所有模式。我们提出了一类被称为项目集生成模型(或igm)的模型,它可以用来形式化地将频繁项目集发现过程与生成模型的学习联系起来。igm使用简单的概率质量函数(在交易空间上)来指定,在特定的项目集合上达到峰值,在其他地方保持一致。在这种联系下,有可能将更高频率的模式与具有更大数据可能性的生成模型严格关联。这使得频繁项集挖掘的生成模型学习解释成为可能。更重要的是,它促进了统计显著性测试,该测试规定了认为有趣的模式所需的最小频率。我们通过在标准基准数据集上的实验来说明我们的分析的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信