Kamran Khan, S. Rehman, Kamran Aziz, S. Fong, S. Sarasvady, Amrita Vishwa
{"title":"DBSCAN: Past, present and future","authors":"Kamran Khan, S. Rehman, Kamran Aziz, S. Fong, S. Sarasvady, Amrita Vishwa","doi":"10.1109/ICADIWT.2014.6814687","DOIUrl":null,"url":null,"abstract":"Data Mining is all about data analysis techniques. It is useful for extracting hidden and interesting patterns from large datasets. Clustering techniques are important when it comes to extracting knowledge from large amount of spatial data collected from various applications including GIS, satellite images, X-ray crystallography, remote sensing and environmental assessment and planning etc. To extract useful pattern from these complex data sources several popular spatial data clustering techniques have been proposed. DBSCAN (Density Based Spatial Clustering of Applications with Noise) is a pioneer density based algorithm. It can discover clusters of any arbitrary shape and size in databases containing even noise and outliers. DBSCAN however are known to have a number of problems such as: (a) it requires user's input to specify parameter values for executing the algorithm; (b) it is prone to dilemma in deciding meaningful clusters from datasets with varying densities; (c) and it incurs certain computational complexity. Many researchers attempted to enhance the basic DBSCAN algorithm, in order to overcome these drawbacks, such as VDBSCAN, FDBSCAN, DD_DBSCAN, and IDBSCAN. In this study, we survey over different variations of DBSCAN algorithms that were proposed so far. These variations are critically evaluated and their limitations are also listed.","PeriodicalId":339627,"journal":{"name":"The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"273","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICADIWT.2014.6814687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 273
Abstract
Data Mining is all about data analysis techniques. It is useful for extracting hidden and interesting patterns from large datasets. Clustering techniques are important when it comes to extracting knowledge from large amount of spatial data collected from various applications including GIS, satellite images, X-ray crystallography, remote sensing and environmental assessment and planning etc. To extract useful pattern from these complex data sources several popular spatial data clustering techniques have been proposed. DBSCAN (Density Based Spatial Clustering of Applications with Noise) is a pioneer density based algorithm. It can discover clusters of any arbitrary shape and size in databases containing even noise and outliers. DBSCAN however are known to have a number of problems such as: (a) it requires user's input to specify parameter values for executing the algorithm; (b) it is prone to dilemma in deciding meaningful clusters from datasets with varying densities; (c) and it incurs certain computational complexity. Many researchers attempted to enhance the basic DBSCAN algorithm, in order to overcome these drawbacks, such as VDBSCAN, FDBSCAN, DD_DBSCAN, and IDBSCAN. In this study, we survey over different variations of DBSCAN algorithms that were proposed so far. These variations are critically evaluated and their limitations are also listed.
数据挖掘是关于数据分析技术的。它对于从大型数据集中提取隐藏的和有趣的模式非常有用。从地理信息系统、卫星图像、x射线晶体学、遥感、环境评估和规划等各种应用中收集的大量空间数据中提取知识时,聚类技术非常重要。为了从这些复杂的数据源中提取有用的模式,人们提出了几种流行的空间数据聚类技术。DBSCAN (Density Based Spatial Clustering of Applications with Noise)是一种基于密度的空间聚类算法。它可以在包含噪声和异常值的数据库中发现任意形状和大小的集群。然而,DBSCAN已知有许多问题,例如:(a)它需要用户输入指定执行算法的参数值;(b)在从不同密度的数据集中确定有意义的聚类时容易陷入困境;(c)并且会产生一定的计算复杂度。为了克服这些缺点,许多研究人员尝试对基本的DBSCAN算法进行改进,如VDBSCAN、FDBSCAN、DD_DBSCAN和IDBSCAN。在本研究中,我们调查了迄今为止提出的DBSCAN算法的不同变体。对这些变化进行了严格的评估,并列出了它们的局限性。