Finding Good Itemsets by Packing Data

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI:10.1109/ICDM.2008.39

Nikolaj Tatti, Jilles Vreeken

引用次数: 42

Abstract

The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. More formally, assuming that the items are ordered, we create a decision tree for each item that may only depend on the previous items. Our approach allows us to find complex interactions between the attributes, not just co-occurrences of 1s. Further, we present a link between the itemsets and the decision trees and use this link to export the itemsets from the decision trees. In this paper we present two algorithms. The first one is a simple greedy approach that builds a family of itemsets directly from data. The second one, given a collection of candidate itemsets, selects a small subset of these itemsets. Our experiments show that these approaches result in compact and high quality descriptions of the data.

查看原文本刊更多论文

通过包装数据找到好的项目集

选择能够很好地代表数据的小组项集的问题最近得到了很多关注。我们通过搜索能有效压缩数据的项集来解决这个问题。作为一种压缩技术，我们将决策树与改进版本的MDL结合使用。更正式地说，假设项目是有序的，我们为每个项目创建一个决策树，它可能只依赖于前一个项目。我们的方法允许我们发现属性之间复杂的相互作用，而不仅仅是15的共同出现。此外，我们提供了项目集和决策树之间的链接，并使用该链接从决策树导出项目集。本文提出了两种算法。第一种方法是一种简单的贪心方法，它直接从数据构建一组项集。第二个方法，给定候选项目集的集合，选择这些项目集的一个小子集。我们的实验表明，这些方法导致数据的紧凑和高质量的描述。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 Eighth IEEE International Conference on Data Mining

自引率

0.00%

发文量