A New Method for Automatic Determining of the DBSCAN Parameters

IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Artur Starczewski, P. Goetzen, M. Er
{"title":"A New Method for Automatic Determining of the DBSCAN Parameters","authors":"Artur Starczewski, P. Goetzen, M. Er","doi":"10.2478/jaiscr-2020-0014","DOIUrl":null,"url":null,"abstract":"Abstract Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"10 1","pages":"209 - 221"},"PeriodicalIF":3.3000,"publicationDate":"2020-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Soft Computing Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2478/jaiscr-2020-0014","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 20

Abstract

Abstract Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.
一种自动确定DBSCAN参数的新方法
聚类是一种有吸引力的技术,用于许多领域,以处理大规模数据。目前已经提出了许多聚类算法。最流行的算法包括基于密度的方法。这类算法可以识别数据集中任意形状的簇。其中最常见的是基于密度的带噪声应用空间聚类(DBSCAN)。原始的DBSCAN算法在各种应用中得到了广泛的应用,并进行了许多不同的修改。然而,有一个基本的问题是正确选择它的两个输入参数,即eps半径和MinPts密度阈值。当集群内密度变化显著时,这些参数的选择尤其困难。本文提出了一种确定不同类型聚类的参数值的新方法。该方法使用由一个函数生成的急剧距离增加检测,该函数计算数据集的每个元素与其第k个最近邻居之间的距离。在多个不同的数据集上进行了实验,结果证实了该方法的良好性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Artificial Intelligence and Soft Computing Research
Journal of Artificial Intelligence and Soft Computing Research COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
7.00
自引率
25.00%
发文量
10
审稿时长
24 weeks
期刊介绍: Journal of Artificial Intelligence and Soft Computing Research (available also at Sciendo (De Gruyter)) is a dynamically developing international journal focused on the latest scientific results and methods constituting traditional artificial intelligence methods and soft computing techniques. Our goal is to bring together scientists representing both approaches and various research communities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信