{"title":"Incremental clustering based on Wasserstein distance between histogram models","authors":"Xiaotong Qian , Guénaël Cabanes , Parisa Rastin , Mohamed Alae Guidani , Ghassen Marrakchi , Marianne Clausel , Nistor Grozavu","doi":"10.1016/j.patcog.2025.111414","DOIUrl":null,"url":null,"abstract":"<div><div>In this article, we present an innovative clustering framework designed for large datasets and real-time data streams which uses a sliding window and histogram model to address the challenge of memory congestion while reducing computational complexity and improving cluster quality for both static and dynamic clustering. The framework provides a simple way to characterize the probability distribution of cluster distributions through histogram models, regardless of their distribution type. This advantage allows for efficient use with various conventional clustering algorithms. To facilitate effective clustering across windows, we use a statistical measure that allows the comparison and merging of different clusters based on the calculation of the Wasserstein distance between histograms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111414"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325000743","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In this article, we present an innovative clustering framework designed for large datasets and real-time data streams which uses a sliding window and histogram model to address the challenge of memory congestion while reducing computational complexity and improving cluster quality for both static and dynamic clustering. The framework provides a simple way to characterize the probability distribution of cluster distributions through histogram models, regardless of their distribution type. This advantage allows for efficient use with various conventional clustering algorithms. To facilitate effective clustering across windows, we use a statistical measure that allows the comparison and merging of different clusters based on the calculation of the Wasserstein distance between histograms.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.