在数据流中发现具有任意形状和密度的簇

A. Magdy, N. A. Yousri, Nagwa M. El-Makky
{"title":"在数据流中发现具有任意形状和密度的簇","authors":"A. Magdy, N. A. Yousri, Nagwa M. El-Makky","doi":"10.1109/ICMLA.2011.56","DOIUrl":null,"url":null,"abstract":"The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Discovering Clusters with Arbitrary Shapes and Densities in Data Streams\",\"authors\":\"A. Magdy, N. A. Yousri, Nagwa M. El-Makky\",\"doi\":\"10.1109/ICMLA.2011.56\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.\",\"PeriodicalId\":439926,\"journal\":{\"name\":\"2011 10th International Conference on Machine Learning and Applications and Workshops\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 10th International Conference on Machine Learning and Applications and Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2011.56\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 10th International Conference on Machine Learning and Applications and Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2011.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

不同领域和不同形式的流数据的可用性增加了流数据分析的重要性。连续流动数据的巨大规模对数据流分析提出了许多挑战。对流数据结构的探索是导致引入各种聚类算法的主要挑战。然而,目前的聚类算法仍然缺乏有效地发现数据流中任意密度的聚类的能力。本文提出了一种新的基于网格和密度的流数据聚类算法。它解决了当前算法在发现任意密度簇方面的缺点。该算法使用在线组件将输入数据映射到网格单元。然后使用离线组件根据密度信息对网格单元进行聚类。提出了相对密度关联度量和动态范围邻域来区分任意密度的聚类。实验结果表明,该算法在聚类质量和可扩展性方面都有了很大的改进。此外,该算法的输出质量对参数选择误差的敏感性较低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Discovering Clusters with Arbitrary Shapes and Densities in Data Streams
The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信