Finding Good Itemsets by Packing Data

Nikolaj Tatti, Jilles Vreeken
{"title":"Finding Good Itemsets by Packing Data","authors":"Nikolaj Tatti, Jilles Vreeken","doi":"10.1109/ICDM.2008.39","DOIUrl":null,"url":null,"abstract":"The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. More formally, assuming that the items are ordered, we create a decision tree for each item that may only depend on the previous items. Our approach allows us to find complex interactions between the attributes, not just co-occurrences of 1s. Further, we present a link between the itemsets and the decision trees and use this link to export the itemsets from the decision trees. In this paper we present two algorithms. The first one is a simple greedy approach that builds a family of itemsets directly from data. The second one, given a collection of candidate itemsets, selects a small subset of these itemsets. Our experiments show that these approaches result in compact and high quality descriptions of the data.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"385 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Eighth IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2008.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 42

Abstract

The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. More formally, assuming that the items are ordered, we create a decision tree for each item that may only depend on the previous items. Our approach allows us to find complex interactions between the attributes, not just co-occurrences of 1s. Further, we present a link between the itemsets and the decision trees and use this link to export the itemsets from the decision trees. In this paper we present two algorithms. The first one is a simple greedy approach that builds a family of itemsets directly from data. The second one, given a collection of candidate itemsets, selects a small subset of these itemsets. Our experiments show that these approaches result in compact and high quality descriptions of the data.
通过包装数据找到好的项目集
选择能够很好地代表数据的小组项集的问题最近得到了很多关注。我们通过搜索能有效压缩数据的项集来解决这个问题。作为一种压缩技术,我们将决策树与改进版本的MDL结合使用。更正式地说,假设项目是有序的,我们为每个项目创建一个决策树,它可能只依赖于前一个项目。我们的方法允许我们发现属性之间复杂的相互作用,而不仅仅是15的共同出现。此外,我们提供了项目集和决策树之间的链接,并使用该链接从决策树导出项目集。本文提出了两种算法。第一种方法是一种简单的贪心方法,它直接从数据构建一组项集。第二个方法,给定候选项目集的集合,选择这些项目集的一个小子集。我们的实验表明,这些方法导致数据的紧凑和高质量的描述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信