A fast algorithm for subspace clustering by pattern similarity

Haixun Wang, F. Chu, W. Fan, Philip S. Yu, J. Pei
{"title":"A fast algorithm for subspace clustering by pattern similarity","authors":"Haixun Wang, F. Chu, W. Fan, Philip S. Yu, J. Pei","doi":"10.1109/SSDBM.2004.3","DOIUrl":null,"url":null,"abstract":"Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including large scale scientific data analysis, target marketing, Web usage analysis, etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle data sets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences. Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences. In this paper, we present a novel algorithm that offers this capability. Experimental results from both real life and synthetic datasets prove its effectiveness and efficiency.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"139 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2004.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

Abstract

Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including large scale scientific data analysis, target marketing, Web usage analysis, etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle data sets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences. Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences. In this paper, we present a novel algorithm that offers this capability. Experimental results from both real life and synthetic datasets prove its effectiveness and efficiency.
基于模式相似度的子空间聚类快速算法
传统的聚类方法侧重于在一组维度上对具有相似值的对象进行分组,而通过模式相似性进行聚类的方法不同,它可以找到在子空间中表现出连贯的上升和下降模式的对象。基于模式的聚类扩展了传统聚类的概念,有利于广泛的应用,包括大规模科学数据分析、目标营销、Web使用分析等。然而,最先进的基于模式的聚类方法(例如,pCluster算法)只能处理数千条记录的数据集,这使得它们不适合许多现实生活中的应用程序。此外,除了庞大的数据量外,许多数据集还具有序列性,例如客户购买记录和网络事件日志通常被建模为数据序列。因此,使基于模式的聚类方法能够处理大型数据集,以及ii)发现嵌入在数据序列中的模式相似性变得非常重要。在本文中,我们提出了一种提供这种能力的新算法。实际数据集和合成数据集的实验结果都证明了该方法的有效性和高效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信