{"title":"使用质量估计聚类数据流","authors":"Andrei Sorin Sabau","doi":"10.1109/SYNASC.2013.45","DOIUrl":null,"url":null,"abstract":"The explosive growth of data generation, storage and analysis within the last decade has led to extensive research towards stream mining algorithms. The existing stream clustering literature contains both adaptation of classical methods as well as novel ones trying to address space and time scalability issues arising from dealing with high volume, high velocity information assets. This paper presents MaStream, a novel stream clustering algorithm experiencing constant space complexity and average case sub-linear time complexity. The algorithm makes use of mass estimation as an alternative to density estimation without employing any distance measure making it highly adaptable to both low and high dimensional data streams. Employing an evolving ensemble of h:d-Trees, the algorithm identifies arbitrary shaped clusters while handling both noise and outliers without a priori information such as total number of clusters. Experimental results over a series of both synthetic and real datasets illustrate the algorithm performance.","PeriodicalId":293085,"journal":{"name":"2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","volume":"36 23","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Clustering Data Streams Using Mass Estimation\",\"authors\":\"Andrei Sorin Sabau\",\"doi\":\"10.1109/SYNASC.2013.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The explosive growth of data generation, storage and analysis within the last decade has led to extensive research towards stream mining algorithms. The existing stream clustering literature contains both adaptation of classical methods as well as novel ones trying to address space and time scalability issues arising from dealing with high volume, high velocity information assets. This paper presents MaStream, a novel stream clustering algorithm experiencing constant space complexity and average case sub-linear time complexity. The algorithm makes use of mass estimation as an alternative to density estimation without employing any distance measure making it highly adaptable to both low and high dimensional data streams. Employing an evolving ensemble of h:d-Trees, the algorithm identifies arbitrary shaped clusters while handling both noise and outliers without a priori information such as total number of clusters. Experimental results over a series of both synthetic and real datasets illustrate the algorithm performance.\",\"PeriodicalId\":293085,\"journal\":{\"name\":\"2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing\",\"volume\":\"36 23\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SYNASC.2013.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2013.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The explosive growth of data generation, storage and analysis within the last decade has led to extensive research towards stream mining algorithms. The existing stream clustering literature contains both adaptation of classical methods as well as novel ones trying to address space and time scalability issues arising from dealing with high volume, high velocity information assets. This paper presents MaStream, a novel stream clustering algorithm experiencing constant space complexity and average case sub-linear time complexity. The algorithm makes use of mass estimation as an alternative to density estimation without employing any distance measure making it highly adaptable to both low and high dimensional data streams. Employing an evolving ensemble of h:d-Trees, the algorithm identifies arbitrary shaped clusters while handling both noise and outliers without a priori information such as total number of clusters. Experimental results over a series of both synthetic and real datasets illustrate the algorithm performance.