Narongrid Tangpathompong, U. Suksawatchon, J. Suksawatchon
{"title":"The Dynamic Hyper-ellipsoidal Micro-Clustering for Evolving Data Stream Using Only Incoming Datum","authors":"Narongrid Tangpathompong, U. Suksawatchon, J. Suksawatchon","doi":"10.1145/3144789.3144818","DOIUrl":null,"url":null,"abstract":"Data stream clustering is becoming the efficient method to cluster an online massive data. The clustering task requires a process capable of partitioning data continuously with incremental learning method. In this paper, we present a new clustering method, called DyHEMstream, which is online and offline algorithm. In online phase, dynamic hyper-ellipsoidal micro-cluster is proposed used to keep summary information about evolving data stream based on new incoming data sample. The shape of proposed micro-cluster can represent the incoming data better than traditional micro-cluster. The algorithm processes each data point in one-pass fashion without storing the entire data set. In offline phase, each cluster is generated by expanding hyper-ellipsoidal micro-clusters to form the final clusters. The DyHEMstream algorithm is evaluated on various synthetic data sets using different quality metrics compared with a famous data stream clustering -- DenStream. Based on purity, Rand index, and Jaccard index, DyHEMstrem is very efficient than DenStream in term of clustering quality in different shapes, sizes, and densities in noisy data.","PeriodicalId":254163,"journal":{"name":"Proceedings of the 2nd International Conference on Intelligent Information Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Intelligent Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3144789.3144818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data stream clustering is becoming the efficient method to cluster an online massive data. The clustering task requires a process capable of partitioning data continuously with incremental learning method. In this paper, we present a new clustering method, called DyHEMstream, which is online and offline algorithm. In online phase, dynamic hyper-ellipsoidal micro-cluster is proposed used to keep summary information about evolving data stream based on new incoming data sample. The shape of proposed micro-cluster can represent the incoming data better than traditional micro-cluster. The algorithm processes each data point in one-pass fashion without storing the entire data set. In offline phase, each cluster is generated by expanding hyper-ellipsoidal micro-clusters to form the final clusters. The DyHEMstream algorithm is evaluated on various synthetic data sets using different quality metrics compared with a famous data stream clustering -- DenStream. Based on purity, Rand index, and Jaccard index, DyHEMstrem is very efficient than DenStream in term of clustering quality in different shapes, sizes, and densities in noisy data.