事务数据库中频繁容错模式的统计信息挖掘

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI:10.1109/ICDM.2007.48

Ardian Kristanto Poernomo, V. Gopalkrishnan

{"title":"事务数据库中频繁容错模式的统计信息挖掘","authors":"Ardian Kristanto Poernomo, V. Gopalkrishnan","doi":"10.1109/ICDM.2007.48","DOIUrl":null,"url":null,"abstract":"Constraints applied on classic frequent patterns are too strict and may cause interesting patterns to be missed. Hence, researchers have proposed to mine a more relaxed version of frequent patterns, where transactions are allowed to miss some items in the itemset they support. Patterns exhibiting such \"faults\" are called frequent fault-tolerant patterns (FFT-patterns) if they are significant in number. In this paper, the term \"pattern\" is distinguished from \"item- set\" as referring to a pair (tidset times itemset). Unlike classical frequent patterns, the number of FFT- patterns grows exponentially not only with the number of items, but also with the number of transactions. Since the latter may reach millions, mining FFT-patterns by enumerating them becomes infeasible. Hence, the challenge is to represent FFT-patterns concisely without losing any useful information. To address this, we draw on the observation that, in transactional databases, the transactions themselves are not important from the data mining point-of- view; i.e. researchers are interested in finding itemsets contained in lots of transactions, rather than in the transactions per se. Therefore, we propose to mine only the frequent itemsets along with the statistical information of the supporting transaction sets, rather than enumerate entire FFT- patterns. Then we present our approach - the BIAS framework, consisting of Backtracking algorithm, Integer Linear Programming (ILP) constraints, and aggregation statistics to solve this problem. Algorithms under this framework not only increase the efficiency of the FFT-patterns mining process by more than an order of magnitude, but also provide a more comprehensive analysis of FFT-Patterns.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Mining Statistical Information of Frequent Fault-Tolerant Patterns in Transactional Databases\",\"authors\":\"Ardian Kristanto Poernomo, V. Gopalkrishnan\",\"doi\":\"10.1109/ICDM.2007.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Constraints applied on classic frequent patterns are too strict and may cause interesting patterns to be missed. Hence, researchers have proposed to mine a more relaxed version of frequent patterns, where transactions are allowed to miss some items in the itemset they support. Patterns exhibiting such \\\"faults\\\" are called frequent fault-tolerant patterns (FFT-patterns) if they are significant in number. In this paper, the term \\\"pattern\\\" is distinguished from \\\"item- set\\\" as referring to a pair (tidset times itemset). Unlike classical frequent patterns, the number of FFT- patterns grows exponentially not only with the number of items, but also with the number of transactions. Since the latter may reach millions, mining FFT-patterns by enumerating them becomes infeasible. Hence, the challenge is to represent FFT-patterns concisely without losing any useful information. To address this, we draw on the observation that, in transactional databases, the transactions themselves are not important from the data mining point-of- view; i.e. researchers are interested in finding itemsets contained in lots of transactions, rather than in the transactions per se. Therefore, we propose to mine only the frequent itemsets along with the statistical information of the supporting transaction sets, rather than enumerate entire FFT- patterns. Then we present our approach - the BIAS framework, consisting of Backtracking algorithm, Integer Linear Programming (ILP) constraints, and aggregation statistics to solve this problem. Algorithms under this framework not only increase the efficiency of the FFT-patterns mining process by more than an order of magnitude, but also provide a more comprehensive analysis of FFT-Patterns.\",\"PeriodicalId\":233758,\"journal\":{\"name\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"volume\":\"71 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2007.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2007.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

应用在经典频繁模式上的约束过于严格，可能会导致错过有趣的模式。因此，研究人员建议挖掘一个更宽松的频繁模式版本，其中允许事务错过它们支持的项集中的某些项。表现出这种“错误”的模式，如果它们的数量很大，就称为频繁容错模式(fft模式)。在本文中，术语“模式”与“项目集”区别开来，因为它指的是一对(tidset乘以itemset)。与经典的频繁模式不同，FFT模式的数量不仅随着项目的数量呈指数增长，而且随着交易的数量呈指数增长。由于后者可能达到数百万，因此通过列举它们来挖掘fft模式变得不可行。因此，挑战是在不丢失任何有用信息的情况下简洁地表示fft模式。为了解决这个问题，我们利用观察结果，在事务性数据库中，从数据挖掘的角度来看，事务本身并不重要;也就是说，研究人员感兴趣的是寻找包含在大量交易中的项集，而不是交易本身。因此，我们建议只挖掘频繁项集以及支持事务集的统计信息，而不是枚举整个FFT模式。然后，我们提出了我们的方法- BIAS框架，包括回溯算法，整数线性规划(ILP)约束和聚合统计来解决这个问题。该框架下的算法不仅将fft模式挖掘过程的效率提高了一个数量级以上，而且还提供了更全面的fft模式分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mining Statistical Information of Frequent Fault-Tolerant Patterns in Transactional Databases

Constraints applied on classic frequent patterns are too strict and may cause interesting patterns to be missed. Hence, researchers have proposed to mine a more relaxed version of frequent patterns, where transactions are allowed to miss some items in the itemset they support. Patterns exhibiting such "faults" are called frequent fault-tolerant patterns (FFT-patterns) if they are significant in number. In this paper, the term "pattern" is distinguished from "item- set" as referring to a pair (tidset times itemset). Unlike classical frequent patterns, the number of FFT- patterns grows exponentially not only with the number of items, but also with the number of transactions. Since the latter may reach millions, mining FFT-patterns by enumerating them becomes infeasible. Hence, the challenge is to represent FFT-patterns concisely without losing any useful information. To address this, we draw on the observation that, in transactional databases, the transactions themselves are not important from the data mining point-of- view; i.e. researchers are interested in finding itemsets contained in lots of transactions, rather than in the transactions per se. Therefore, we propose to mine only the frequent itemsets along with the statistical information of the supporting transaction sets, rather than enumerate entire FFT- patterns. Then we present our approach - the BIAS framework, consisting of Backtracking algorithm, Integer Linear Programming (ILP) constraints, and aggregation statistics to solve this problem. Algorithms under this framework not only increase the efficiency of the FFT-patterns mining process by more than an order of magnitude, but also provide a more comprehensive analysis of FFT-Patterns.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Seventh IEEE International Conference on Data Mining (ICDM 2007)

自引率

0.00%

发文量