基于密度的数据流聚类技术综述

Affan Ahmad Toor, M. Usman, W. Ahmed
{"title":"基于密度的数据流聚类技术综述","authors":"Affan Ahmad Toor, M. Usman, W. Ahmed","doi":"10.1109/ICDIM.2016.7829786","DOIUrl":null,"url":null,"abstract":"Data stream is relatively new and emerging domain in the current era of Internet advancement. Clustering data streams is equally important and difficult because of the numerous hurdles attached to it. A number of algorithms have been proposed to offer solutions for efficient clustering. Grid-based clustering approach was adopted few years ago to overcome the limitations of conventional partition-based algorithms for data stream clustering. Data points are mapped to the grid-cells to form micro-clusters which later are used for clustering. Using density in the clustering process is proved to be a remarkable success and in recent years many researchers have used density to find arbitrary shaped & density clusters and identify outliers. Concept of density-based clustering is to use grid-based clustering at core and create a distinction between dense and sparse grids using density threshold values and use dense grids to yield clustering results; which provide more cluster purity and accuracy. In this paper, we reviewed grid-based data stream clustering algorithms which utilize density. We evaluated their functionalities and identified their limitations. In the end, we critically evaluated different aspects of algorithms and suggested one of these algorithms which is better in terms of performance and accuracy.","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A critical review of density-based data stream clustering techniques\",\"authors\":\"Affan Ahmad Toor, M. Usman, W. Ahmed\",\"doi\":\"10.1109/ICDIM.2016.7829786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data stream is relatively new and emerging domain in the current era of Internet advancement. Clustering data streams is equally important and difficult because of the numerous hurdles attached to it. A number of algorithms have been proposed to offer solutions for efficient clustering. Grid-based clustering approach was adopted few years ago to overcome the limitations of conventional partition-based algorithms for data stream clustering. Data points are mapped to the grid-cells to form micro-clusters which later are used for clustering. Using density in the clustering process is proved to be a remarkable success and in recent years many researchers have used density to find arbitrary shaped & density clusters and identify outliers. Concept of density-based clustering is to use grid-based clustering at core and create a distinction between dense and sparse grids using density threshold values and use dense grids to yield clustering results; which provide more cluster purity and accuracy. In this paper, we reviewed grid-based data stream clustering algorithms which utilize density. We evaluated their functionalities and identified their limitations. In the end, we critically evaluated different aspects of algorithms and suggested one of these algorithms which is better in terms of performance and accuracy.\",\"PeriodicalId\":146662,\"journal\":{\"name\":\"2016 Eleventh International Conference on Digital Information Management (ICDIM)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Eleventh International Conference on Digital Information Management (ICDIM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDIM.2016.7829786\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2016.7829786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

数据流是当今互联网发展的新领域。聚类数据流同样重要和困难,因为它附带了许多障碍。已经提出了许多算法来提供有效聚类的解决方案。为了克服传统的基于分区的数据流聚类算法的局限性,几年前采用了基于网格的聚类方法。数据点被映射到网格单元,形成微集群,然后用于聚类。在聚类过程中使用密度被证明是一个显著的成功,近年来许多研究人员使用密度来发现任意形状和密度的聚类和识别异常值。基于密度的聚类概念是以基于网格的聚类为核心,利用密度阈值区分密集网格和稀疏网格,并利用密集网格产生聚类结果;这提供了更高的聚类纯度和准确性。本文综述了基于网格的数据流聚类算法。我们评估了它们的功能并确定了它们的局限性。最后,我们批判性地评估了算法的不同方面,并提出了其中一种在性能和准确性方面更好的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A critical review of density-based data stream clustering techniques
Data stream is relatively new and emerging domain in the current era of Internet advancement. Clustering data streams is equally important and difficult because of the numerous hurdles attached to it. A number of algorithms have been proposed to offer solutions for efficient clustering. Grid-based clustering approach was adopted few years ago to overcome the limitations of conventional partition-based algorithms for data stream clustering. Data points are mapped to the grid-cells to form micro-clusters which later are used for clustering. Using density in the clustering process is proved to be a remarkable success and in recent years many researchers have used density to find arbitrary shaped & density clusters and identify outliers. Concept of density-based clustering is to use grid-based clustering at core and create a distinction between dense and sparse grids using density threshold values and use dense grids to yield clustering results; which provide more cluster purity and accuracy. In this paper, we reviewed grid-based data stream clustering algorithms which utilize density. We evaluated their functionalities and identified their limitations. In the end, we critically evaluated different aspects of algorithms and suggested one of these algorithms which is better in terms of performance and accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信