An agent-based dual-tier algorithm for clustering data streams

Dongbin Zhou, Lifeng Jia, Zhe Wang, Xiujuan Xu, Chunguang Zhou
{"title":"An agent-based dual-tier algorithm for clustering data streams","authors":"Dongbin Zhou, Lifeng Jia, Zhe Wang, Xiujuan Xu, Chunguang Zhou","doi":"10.1109/GRC.2006.1635855","DOIUrl":null,"url":null,"abstract":"Characteristics of data stream make it difficult for the clustering algorithms to satisfy the requirements on efficiency and effectiveness. This paper proposes a data stream clustering algorithm on dual-tier structure which employs the agent method. In the on-line process, a set of agents working simultaneously collect similar data points into sub-clusters by applying a heuristic strategy. And in the off-line process, summary information from the on-line component will be further analyzed to obtain the final clusters. The algorithm also supports the time-window queries on streams. The empirical evidence shows that this method can obtain high-quality clusters with low time complexity. analysis over an arbitrary period of the stream etc. As for stream clustering, a common method is dividing the streaming data into chunks, and algorithms for static sets can be used on each sub-set separately (2). In recent years, stream algorithms have developed into a two-phase structure (3), (4). Usually, a dual framework includes two parts: the on-line component and the off-line component. The former is responsible for the fast but rough processing of streaming data and saving the summary information to meet the one-pass restriction while the latter takes advantage of the information to conduct high-level analysis. At present, stream algorithms are still facing some problems, for example: sensitive to the initial data points; bad quality of clusters due to the loss of global information caused by dividing the stream; high time complexity etc. A novel dual-tier clustering algorithm for data streams, AGCluStream, is proposed in this paper. The on-line algorithm uses agents to make similar points denser in local areas, and record the temporary distribution of data according to the pyramidal time frame (3). The off-line algorithm uses these records to conduct time-window analysis and higher-level clustering analysis. AGCluStream dose not divide the stream, and it adopts an incomplete-partition strategy to maintain the global information more effectively.","PeriodicalId":400997,"journal":{"name":"2006 IEEE International Conference on Granular Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2006.1635855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Characteristics of data stream make it difficult for the clustering algorithms to satisfy the requirements on efficiency and effectiveness. This paper proposes a data stream clustering algorithm on dual-tier structure which employs the agent method. In the on-line process, a set of agents working simultaneously collect similar data points into sub-clusters by applying a heuristic strategy. And in the off-line process, summary information from the on-line component will be further analyzed to obtain the final clusters. The algorithm also supports the time-window queries on streams. The empirical evidence shows that this method can obtain high-quality clusters with low time complexity. analysis over an arbitrary period of the stream etc. As for stream clustering, a common method is dividing the streaming data into chunks, and algorithms for static sets can be used on each sub-set separately (2). In recent years, stream algorithms have developed into a two-phase structure (3), (4). Usually, a dual framework includes two parts: the on-line component and the off-line component. The former is responsible for the fast but rough processing of streaming data and saving the summary information to meet the one-pass restriction while the latter takes advantage of the information to conduct high-level analysis. At present, stream algorithms are still facing some problems, for example: sensitive to the initial data points; bad quality of clusters due to the loss of global information caused by dividing the stream; high time complexity etc. A novel dual-tier clustering algorithm for data streams, AGCluStream, is proposed in this paper. The on-line algorithm uses agents to make similar points denser in local areas, and record the temporary distribution of data according to the pyramidal time frame (3). The off-line algorithm uses these records to conduct time-window analysis and higher-level clustering analysis. AGCluStream dose not divide the stream, and it adopts an incomplete-partition strategy to maintain the global information more effectively.
一种基于代理的数据流聚类双层算法
数据流的特性使得聚类算法难以满足对效率和有效性的要求。提出了一种采用agent方法的双层结构数据流聚类算法。在在线过程中,一组同时工作的代理通过启发式策略将相似的数据点收集到子聚类中。在离线过程中,对在线组件的汇总信息进行进一步分析,得到最终的聚类。该算法还支持流上的时间窗查询。实证表明,该方法可以获得低时间复杂度的高质量聚类。流等的任意时间段的分析。对于流聚类,常用的方法是将流数据分成块,静态集的算法可以分别在每个子集上使用(2)。近年来,流算法已经发展为两阶段结构(3),(4)。通常,一个双重框架包括在线组件和离线组件两部分。前者负责对流数据进行快速但粗略的处理,并保存汇总信息以满足一次通过的限制,后者则利用这些信息进行高层次的分析。目前,流算法还面临着一些问题,例如:对初始数据点敏感;分割流导致全局信息丢失,导致聚类质量差;高时间复杂度等。本文提出了一种新的数据流双层聚类算法——AGCluStream。在线算法利用agent使局部区域的相似点更加密集,并按照金字塔时间框架记录数据的临时分布(3)。离线算法利用这些记录进行时间窗口分析和更高级的聚类分析。AGCluStream不划分流,采用不完全划分策略,更有效地维护全局信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信