{"title":"Learning Self-Growth Maps for Fast and Accurate Imbalanced Streaming Data Clustering.","authors":"Yiqun Zhang,Sen Feng,Pengkai Wang,Zexi Tan,Xiaopeng Luo,Yuzhu Ji,Rong Zou,Yiu-Ming Cheung","doi":"10.1109/tnnls.2025.3563769","DOIUrl":null,"url":null,"abstract":"Streaming data clustering is a popular research topic in data mining and machine learning. Since streaming data is usually analyzed in data chunks, it is more susceptible to encountering the dynamic cluster imbalance issue. That is, the imbalance ratio (IR) of clusters changes over time, which can easily lead to fluctuations in either the accuracy or the efficiency of streaming data clustering. Therefore, an accurate and efficient streaming data clustering approach is proposed to adapt to the drifting and imbalanced cluster distributions. We first design a self-growth map (SGM) that can automatically arrange neurons on demand according to local distribution, and thus achieve fast and incremental adaptation to the streaming distributions. Since SGM allocates an excess number of density-sensitive neurons to describe the global distribution, it can avoid missing small clusters among imbalanced distributions. We also propose a fast hierarchical merging (HM) strategy to combine the neurons that break up the relatively large clusters. It exploits the maintained SGM to quickly retrieve the intracluster distribution pairs for merging, which circumvents the most laborious global searching. It turns out that the proposed SGM can incrementally adapt to the distributions of new chunks, and the self-growth map-guided hierarchical merging for the imbalanced data clustering (SOHI) approach can quickly explore a true number of imbalanced clusters. Extensive experiments demonstrate that SOHI can efficiently and accurately explore cluster distributions for streaming data.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"148 1","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3563769","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Streaming data clustering is a popular research topic in data mining and machine learning. Since streaming data is usually analyzed in data chunks, it is more susceptible to encountering the dynamic cluster imbalance issue. That is, the imbalance ratio (IR) of clusters changes over time, which can easily lead to fluctuations in either the accuracy or the efficiency of streaming data clustering. Therefore, an accurate and efficient streaming data clustering approach is proposed to adapt to the drifting and imbalanced cluster distributions. We first design a self-growth map (SGM) that can automatically arrange neurons on demand according to local distribution, and thus achieve fast and incremental adaptation to the streaming distributions. Since SGM allocates an excess number of density-sensitive neurons to describe the global distribution, it can avoid missing small clusters among imbalanced distributions. We also propose a fast hierarchical merging (HM) strategy to combine the neurons that break up the relatively large clusters. It exploits the maintained SGM to quickly retrieve the intracluster distribution pairs for merging, which circumvents the most laborious global searching. It turns out that the proposed SGM can incrementally adapt to the distributions of new chunks, and the self-growth map-guided hierarchical merging for the imbalanced data clustering (SOHI) approach can quickly explore a true number of imbalanced clusters. Extensive experiments demonstrate that SOHI can efficiently and accurately explore cluster distributions for streaming data.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.