The Dynamic Hyper-ellipsoidal Micro-Clustering for Evolving Data Stream Using Only Incoming Datum

Narongrid Tangpathompong, U. Suksawatchon, J. Suksawatchon
{"title":"The Dynamic Hyper-ellipsoidal Micro-Clustering for Evolving Data Stream Using Only Incoming Datum","authors":"Narongrid Tangpathompong, U. Suksawatchon, J. Suksawatchon","doi":"10.1145/3144789.3144818","DOIUrl":null,"url":null,"abstract":"Data stream clustering is becoming the efficient method to cluster an online massive data. The clustering task requires a process capable of partitioning data continuously with incremental learning method. In this paper, we present a new clustering method, called DyHEMstream, which is online and offline algorithm. In online phase, dynamic hyper-ellipsoidal micro-cluster is proposed used to keep summary information about evolving data stream based on new incoming data sample. The shape of proposed micro-cluster can represent the incoming data better than traditional micro-cluster. The algorithm processes each data point in one-pass fashion without storing the entire data set. In offline phase, each cluster is generated by expanding hyper-ellipsoidal micro-clusters to form the final clusters. The DyHEMstream algorithm is evaluated on various synthetic data sets using different quality metrics compared with a famous data stream clustering -- DenStream. Based on purity, Rand index, and Jaccard index, DyHEMstrem is very efficient than DenStream in term of clustering quality in different shapes, sizes, and densities in noisy data.","PeriodicalId":254163,"journal":{"name":"Proceedings of the 2nd International Conference on Intelligent Information Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Intelligent Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3144789.3144818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data stream clustering is becoming the efficient method to cluster an online massive data. The clustering task requires a process capable of partitioning data continuously with incremental learning method. In this paper, we present a new clustering method, called DyHEMstream, which is online and offline algorithm. In online phase, dynamic hyper-ellipsoidal micro-cluster is proposed used to keep summary information about evolving data stream based on new incoming data sample. The shape of proposed micro-cluster can represent the incoming data better than traditional micro-cluster. The algorithm processes each data point in one-pass fashion without storing the entire data set. In offline phase, each cluster is generated by expanding hyper-ellipsoidal micro-clusters to form the final clusters. The DyHEMstream algorithm is evaluated on various synthetic data sets using different quality metrics compared with a famous data stream clustering -- DenStream. Based on purity, Rand index, and Jaccard index, DyHEMstrem is very efficient than DenStream in term of clustering quality in different shapes, sizes, and densities in noisy data.
仅使用传入基准的演化数据流动态超椭球微聚类
数据流聚类正在成为对在线海量数据进行聚类的有效方法。聚类任务需要一个能够使用增量学习方法连续划分数据的过程。本文提出了一种新的聚类方法,称为DyHEMstream,它是一种在线和离线算法。在在线阶段,基于新输入的数据样本,提出了动态超椭球微簇来保存演化数据流的汇总信息。与传统的微簇相比,该微簇的形状能更好地表征输入数据。该算法以一遍的方式处理每个数据点,而不存储整个数据集。在离线阶段,每个团簇都是由超椭球微团簇膨胀形成最终团簇。与著名的数据流聚类——DenStream相比,DyHEMstream算法使用不同的质量指标在各种合成数据集上进行了评估。基于纯度、Rand指数和Jaccard指数,dyhemstream在噪声数据中不同形状、大小和密度的聚类质量方面比DenStream更有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信