Matthias Carnein, Dennis Assenmacher, H. Trautmann
{"title":"An Empirical Comparison of Stream Clustering Algorithms","authors":"Matthias Carnein, Dennis Assenmacher, H. Trautmann","doi":"10.1145/3075564.3078887","DOIUrl":null,"url":null,"abstract":"Analysing streaming data has received considerable attention over the recent years. A key research area in this field is stream clustering which aims to recognize patterns in a possibly unbounded data stream of varying speed and structure. Over the past decades a multitude of new stream clustering algorithms have been proposed. However, to the best of our knowledge, no rigorous analysis and comparison of the different approaches has been performed. Our paper fills this gap and provides extensive experiments for a total of ten popular algorithms. We utilize a number of standard data sets of both, real and synthetic data and identify key weaknesses and strengths of the existing algorithms.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Computing Frontiers Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3075564.3078887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35
Abstract
Analysing streaming data has received considerable attention over the recent years. A key research area in this field is stream clustering which aims to recognize patterns in a possibly unbounded data stream of varying speed and structure. Over the past decades a multitude of new stream clustering algorithms have been proposed. However, to the best of our knowledge, no rigorous analysis and comparison of the different approaches has been performed. Our paper fills this gap and provides extensive experiments for a total of ten popular algorithms. We utilize a number of standard data sets of both, real and synthetic data and identify key weaknesses and strengths of the existing algorithms.