Bo Liang , Jianghui Cai , Haifeng Yang , Gao Jie , Yaling Xun , Yupeng Wang , Fujiang Yuan
{"title":"演化数据流的短时记忆聚类算法","authors":"Bo Liang , Jianghui Cai , Haifeng Yang , Gao Jie , Yaling Xun , Yupeng Wang , Fujiang Yuan","doi":"10.1016/j.asoc.2025.113442","DOIUrl":null,"url":null,"abstract":"<div><div>Data stream clustering is a fundamental problem in many streaming data analysis applications, which faces the following key challenges: (a) efficiently utilizing initial results to update clusters; (b) effectively managing concept drift when dealing with non-stationary data, which leads to decreased clustering accuracy over time. To address these limitations, this paper presents a new short-term memory clustering algorithm for evolving data streams, called STM-Stream. Short-term memory refers to storing the nucleus and radius of cell groups as the window slides, enabling streaming data clustering through three key steps: Firstly, the cell split method is used to obtain the initial data distribution. Then, a novel dynamic projection strategy is used to fuse the stored data distribution with newly arriving data distributions. Finally, based on the updated memory, an adaptive group radius grouping and merging method is designed to produce the final clustering result. Regarding the frequently occurring concept drift issue during clustering, the internal processes of four types of concept drift (Sudden, Gradual, Incremental, and Reoccurring) are analyzed and discussed. The article further extracted two main change processes: Gradual and Sudden drift to characterize the data migration process. Through dynamic projection and adaptive group radius methods, the algorithm can automatically correct its memory with sudden or gradual changes when concept drift occurs, which is equally effective for incremental and occurring concept drift. The experiment demonstrated that STM-Stream can effectively address concept drift, which frequently occurs in continuously generated streaming data, thereby preventing a decline in clustering accuracy over time.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"181 ","pages":"Article 113442"},"PeriodicalIF":6.6000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A short-term memory clustering algorithm for evolving data streams\",\"authors\":\"Bo Liang , Jianghui Cai , Haifeng Yang , Gao Jie , Yaling Xun , Yupeng Wang , Fujiang Yuan\",\"doi\":\"10.1016/j.asoc.2025.113442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Data stream clustering is a fundamental problem in many streaming data analysis applications, which faces the following key challenges: (a) efficiently utilizing initial results to update clusters; (b) effectively managing concept drift when dealing with non-stationary data, which leads to decreased clustering accuracy over time. To address these limitations, this paper presents a new short-term memory clustering algorithm for evolving data streams, called STM-Stream. Short-term memory refers to storing the nucleus and radius of cell groups as the window slides, enabling streaming data clustering through three key steps: Firstly, the cell split method is used to obtain the initial data distribution. Then, a novel dynamic projection strategy is used to fuse the stored data distribution with newly arriving data distributions. Finally, based on the updated memory, an adaptive group radius grouping and merging method is designed to produce the final clustering result. Regarding the frequently occurring concept drift issue during clustering, the internal processes of four types of concept drift (Sudden, Gradual, Incremental, and Reoccurring) are analyzed and discussed. The article further extracted two main change processes: Gradual and Sudden drift to characterize the data migration process. Through dynamic projection and adaptive group radius methods, the algorithm can automatically correct its memory with sudden or gradual changes when concept drift occurs, which is equally effective for incremental and occurring concept drift. The experiment demonstrated that STM-Stream can effectively address concept drift, which frequently occurs in continuously generated streaming data, thereby preventing a decline in clustering accuracy over time.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"181 \",\"pages\":\"Article 113442\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625007537\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625007537","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A short-term memory clustering algorithm for evolving data streams
Data stream clustering is a fundamental problem in many streaming data analysis applications, which faces the following key challenges: (a) efficiently utilizing initial results to update clusters; (b) effectively managing concept drift when dealing with non-stationary data, which leads to decreased clustering accuracy over time. To address these limitations, this paper presents a new short-term memory clustering algorithm for evolving data streams, called STM-Stream. Short-term memory refers to storing the nucleus and radius of cell groups as the window slides, enabling streaming data clustering through three key steps: Firstly, the cell split method is used to obtain the initial data distribution. Then, a novel dynamic projection strategy is used to fuse the stored data distribution with newly arriving data distributions. Finally, based on the updated memory, an adaptive group radius grouping and merging method is designed to produce the final clustering result. Regarding the frequently occurring concept drift issue during clustering, the internal processes of four types of concept drift (Sudden, Gradual, Incremental, and Reoccurring) are analyzed and discussed. The article further extracted two main change processes: Gradual and Sudden drift to characterize the data migration process. Through dynamic projection and adaptive group radius methods, the algorithm can automatically correct its memory with sudden or gradual changes when concept drift occurs, which is equally effective for incremental and occurring concept drift. The experiment demonstrated that STM-Stream can effectively address concept drift, which frequently occurs in continuously generated streaming data, thereby preventing a decline in clustering accuracy over time.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.