频繁近似序列模式的有效发现

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI:10.1109/ICDM.2007.75

Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu

{"title":"频繁近似序列模式的有效发现","authors":"Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu","doi":"10.1109/ICDM.2007.75","DOIUrl":null,"url":null,"abstract":"We propose an efficient algorithm for mining frequent approximate sequential patterns under the Hamming distance model. Our algorithm gains its efficiency by adopting a \"break-down-and-build-up\" methodology. The \"breakdown\" is based on the observation that all occurrences of a frequent pattern can be classified into groups, which we call strands. We developed efficient algorithms to quickly mine out all strands by iterative growth. In the \"build-up\" stage, these strands are grouped up to form the support sets from which all approximate patterns would be identified. A salient feature of our algorithm is its ability to grow the frequent patterns by iteratively assembling building blocks of significant sizes in a local search fashion. By avoiding incremental growth and global search, we achieve greater efficiency without losing the completeness of the mining result. Our experimental studies demonstrate that our algorithm is efficient in mining globally repeating approximate sequential patterns that would have been missed by existing methods.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"Efficient Discovery of Frequent Approximate Sequential Patterns\",\"authors\":\"Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu\",\"doi\":\"10.1109/ICDM.2007.75\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an efficient algorithm for mining frequent approximate sequential patterns under the Hamming distance model. Our algorithm gains its efficiency by adopting a \\\"break-down-and-build-up\\\" methodology. The \\\"breakdown\\\" is based on the observation that all occurrences of a frequent pattern can be classified into groups, which we call strands. We developed efficient algorithms to quickly mine out all strands by iterative growth. In the \\\"build-up\\\" stage, these strands are grouped up to form the support sets from which all approximate patterns would be identified. A salient feature of our algorithm is its ability to grow the frequent patterns by iteratively assembling building blocks of significant sizes in a local search fashion. By avoiding incremental growth and global search, we achieve greater efficiency without losing the completeness of the mining result. Our experimental studies demonstrate that our algorithm is efficient in mining globally repeating approximate sequential patterns that would have been missed by existing methods.\",\"PeriodicalId\":233758,\"journal\":{\"name\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2007.75\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2007.75","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

摘要

提出了一种在汉明距离模型下挖掘频繁近似序列模式的有效算法。我们的算法通过采用“分解-构建”方法来提高效率。“分解”是基于这样的观察，即所有频繁模式的出现都可以被分类为组，我们称之为链。我们开发了有效的算法，通过迭代增长快速挖掘所有链。在“组装”阶段，这些线被组合起来形成支撑套，从中可以识别出所有近似的图案。我们算法的一个显著特征是它能够通过以局部搜索方式迭代地组装显著大小的构建块来增长频繁模式。通过避免增量增长和全局搜索，我们在不损失挖掘结果完整性的情况下实现了更高的效率。我们的实验研究表明，我们的算法在挖掘全球重复的近似序列模式方面是有效的，而现有的方法可能会错过这些模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Discovery of Frequent Approximate Sequential Patterns

We propose an efficient algorithm for mining frequent approximate sequential patterns under the Hamming distance model. Our algorithm gains its efficiency by adopting a "break-down-and-build-up" methodology. The "breakdown" is based on the observation that all occurrences of a frequent pattern can be classified into groups, which we call strands. We developed efficient algorithms to quickly mine out all strands by iterative growth. In the "build-up" stage, these strands are grouped up to form the support sets from which all approximate patterns would be identified. A salient feature of our algorithm is its ability to grow the frequent patterns by iteratively assembling building blocks of significant sizes in a local search fashion. By avoiding incremental growth and global search, we achieve greater efficiency without losing the completeness of the mining result. Our experimental studies demonstrate that our algorithm is efficient in mining globally repeating approximate sequential patterns that would have been missed by existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Seventh IEEE International Conference on Data Mining (ICDM 2007)

自引率

0.00%

发文量