高维数据流中的加权子空间聚类算法

Jiadong Ren, Lining Li, Changzhen Hu
{"title":"高维数据流中的加权子空间聚类算法","authors":"Jiadong Ren, Lining Li, Changzhen Hu","doi":"10.1109/ICICIC.2009.64","DOIUrl":null,"url":null,"abstract":"Clustering is a significant and difficult problem in data stream mining due to a mass of streaming data arriving continuously. High-dimensional data streams make clustering analysis more complex because of the sparsity of data. In this paper, we propose a new clustering method for highdimensional data streams, called WSCStream. The method incorporates a fading cluster structure and a dimensional weight matrix. We assign a weight to each dimension of corresponding cluster in the matrix. The weight associated with each dimension indicates the importance of each dimension to the corresponding cluster. The weighted distance between a cluster and a data point is used to obtain the final clusters as the new data points arrive over time. Experimental results on real and synthetic datasets demonstrate that WSCStream has higher clustering quality than PHStream.","PeriodicalId":240226,"journal":{"name":"2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Weighted Subspace Clustering Algorithm in High-Dimensional Data Streams\",\"authors\":\"Jiadong Ren, Lining Li, Changzhen Hu\",\"doi\":\"10.1109/ICICIC.2009.64\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering is a significant and difficult problem in data stream mining due to a mass of streaming data arriving continuously. High-dimensional data streams make clustering analysis more complex because of the sparsity of data. In this paper, we propose a new clustering method for highdimensional data streams, called WSCStream. The method incorporates a fading cluster structure and a dimensional weight matrix. We assign a weight to each dimension of corresponding cluster in the matrix. The weight associated with each dimension indicates the importance of each dimension to the corresponding cluster. The weighted distance between a cluster and a data point is used to obtain the final clusters as the new data points arrive over time. Experimental results on real and synthetic datasets demonstrate that WSCStream has higher clustering quality than PHStream.\",\"PeriodicalId\":240226,\"journal\":{\"name\":\"2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC)\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICIC.2009.64\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIC.2009.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

由于大量的流数据连续到达,聚类是数据流挖掘中的一个重要而困难的问题。高维数据流由于数据的稀疏性,使得聚类分析变得更加复杂。本文提出了一种新的高维数据流聚类方法——WSCStream。该方法结合了衰落聚类结构和维度权重矩阵。我们对矩阵中相应聚类的每一个维度都赋予一个权重。与每个维度相关联的权重表示每个维度对相应集群的重要性。随着时间的推移,随着新数据点的到来,使用聚类和数据点之间的加权距离来获得最终的聚类。在真实数据集和合成数据集上的实验结果表明,WSCStream比PHStream具有更高的聚类质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Weighted Subspace Clustering Algorithm in High-Dimensional Data Streams
Clustering is a significant and difficult problem in data stream mining due to a mass of streaming data arriving continuously. High-dimensional data streams make clustering analysis more complex because of the sparsity of data. In this paper, we propose a new clustering method for highdimensional data streams, called WSCStream. The method incorporates a fading cluster structure and a dimensional weight matrix. We assign a weight to each dimension of corresponding cluster in the matrix. The weight associated with each dimension indicates the importance of each dimension to the corresponding cluster. The weighted distance between a cluster and a data point is used to obtain the final clusters as the new data points arrive over time. Experimental results on real and synthetic datasets demonstrate that WSCStream has higher clustering quality than PHStream.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信