Learning Self-Growth Maps for Fast and Accurate Imbalanced Streaming Data Clustering.

IF 10.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-05-20 DOI:10.1109/tnnls.2025.3563769

Yiqun Zhang,Sen Feng,Pengkai Wang,Zexi Tan,Xiaopeng Luo,Yuzhu Ji,Rong Zou,Yiu-Ming Cheung

{"title":"Learning Self-Growth Maps for Fast and Accurate Imbalanced Streaming Data Clustering.","authors":"Yiqun Zhang,Sen Feng,Pengkai Wang,Zexi Tan,Xiaopeng Luo,Yuzhu Ji,Rong Zou,Yiu-Ming Cheung","doi":"10.1109/tnnls.2025.3563769","DOIUrl":null,"url":null,"abstract":"Streaming data clustering is a popular research topic in data mining and machine learning. Since streaming data is usually analyzed in data chunks, it is more susceptible to encountering the dynamic cluster imbalance issue. That is, the imbalance ratio (IR) of clusters changes over time, which can easily lead to fluctuations in either the accuracy or the efficiency of streaming data clustering. Therefore, an accurate and efficient streaming data clustering approach is proposed to adapt to the drifting and imbalanced cluster distributions. We first design a self-growth map (SGM) that can automatically arrange neurons on demand according to local distribution, and thus achieve fast and incremental adaptation to the streaming distributions. Since SGM allocates an excess number of density-sensitive neurons to describe the global distribution, it can avoid missing small clusters among imbalanced distributions. We also propose a fast hierarchical merging (HM) strategy to combine the neurons that break up the relatively large clusters. It exploits the maintained SGM to quickly retrieve the intracluster distribution pairs for merging, which circumvents the most laborious global searching. It turns out that the proposed SGM can incrementally adapt to the distributions of new chunks, and the self-growth map-guided hierarchical merging for the imbalanced data clustering (SOHI) approach can quickly explore a true number of imbalanced clusters. Extensive experiments demonstrate that SOHI can efficiently and accurately explore cluster distributions for streaming data.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"148 1","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3563769","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Streaming data clustering is a popular research topic in data mining and machine learning. Since streaming data is usually analyzed in data chunks, it is more susceptible to encountering the dynamic cluster imbalance issue. That is, the imbalance ratio (IR) of clusters changes over time, which can easily lead to fluctuations in either the accuracy or the efficiency of streaming data clustering. Therefore, an accurate and efficient streaming data clustering approach is proposed to adapt to the drifting and imbalanced cluster distributions. We first design a self-growth map (SGM) that can automatically arrange neurons on demand according to local distribution, and thus achieve fast and incremental adaptation to the streaming distributions. Since SGM allocates an excess number of density-sensitive neurons to describe the global distribution, it can avoid missing small clusters among imbalanced distributions. We also propose a fast hierarchical merging (HM) strategy to combine the neurons that break up the relatively large clusters. It exploits the maintained SGM to quickly retrieve the intracluster distribution pairs for merging, which circumvents the most laborious global searching. It turns out that the proposed SGM can incrementally adapt to the distributions of new chunks, and the self-growth map-guided hierarchical merging for the imbalanced data clustering (SOHI) approach can quickly explore a true number of imbalanced clusters. Extensive experiments demonstrate that SOHI can efficiently and accurately explore cluster distributions for streaming data.

查看原文本刊更多论文

学习自成长图用于快速准确的不平衡流数据聚类。

流数据聚类是数据挖掘和机器学习领域的一个热门研究课题。由于流数据通常在数据块中进行分析，因此更容易遇到动态集群不平衡问题。即聚类的不平衡比（IR）会随着时间的推移而变化，这很容易导致流数据聚类的准确性或效率出现波动。因此，提出了一种准确高效的流数据聚类方法，以适应漂移和不平衡的聚类分布。我们首先设计了一个自生长映射（SGM），它可以根据局部分布自动按需排列神经元，从而实现对流分布的快速增量适应。由于SGM分配了过多的密度敏感神经元来描述全局分布，因此可以避免在不平衡分布中遗漏小簇。我们还提出了一种快速分层合并（HM）策略来合并分解相对较大的簇的神经元。它利用维护的SGM快速检索集群内分布对进行合并，从而避免了最费力的全局搜索。结果表明，该方法能够逐步适应新数据块的分布，而基于自增长映射引导分层合并的不平衡数据聚类（SOHI）方法能够快速挖掘出真实数量的不平衡数据聚类。大量的实验表明，SOHI可以高效、准确地探索流数据的聚类分布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.