Negar Ohadi, A. Kamandi, M. Shabankhah, Seyed Mohsen Fatemi, S. Hosseini, Alireza Mahmoudi
{"title":"SW-DBSCAN: A Grid-based DBSCAN Algorithm for Large Datasets","authors":"Negar Ohadi, A. Kamandi, M. Shabankhah, Seyed Mohsen Fatemi, S. Hosseini, Alireza Mahmoudi","doi":"10.1109/ICWR49608.2020.9122313","DOIUrl":null,"url":null,"abstract":"Data clustering aims to discover the underlying structure of data. it has many applications in data analysis and it is one of the most widely used tools in data mining. DBSCAN is one of the most famous clustering algorithms. its advantages are to identify clusters of various shapes and define the number of clusters. Since DBSCAN is sensitive to its parameters which are ε and MinPts, it may perform poorly when the dataset is unbalanced. To solve this problem, this paper proposes a sliding window DBSCAN clustering algorithm that uses Gridding and local parameters for unbalanced data which we will refer to as SW-DBSCAN. The algorithm divides the dataset into several grids. The size and shape of each gird depends on the specimen density specification. Then, for each grid, the parameters are adjusted for local clustering and eventually merging data zones. Experimental results show that this algorithm can help to improve the performance of the DBSCAN algorithm and can deal with arbitrary data and asymmetric data.","PeriodicalId":231982,"journal":{"name":"2020 6th International Conference on Web Research (ICWR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR49608.2020.9122313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Data clustering aims to discover the underlying structure of data. it has many applications in data analysis and it is one of the most widely used tools in data mining. DBSCAN is one of the most famous clustering algorithms. its advantages are to identify clusters of various shapes and define the number of clusters. Since DBSCAN is sensitive to its parameters which are ε and MinPts, it may perform poorly when the dataset is unbalanced. To solve this problem, this paper proposes a sliding window DBSCAN clustering algorithm that uses Gridding and local parameters for unbalanced data which we will refer to as SW-DBSCAN. The algorithm divides the dataset into several grids. The size and shape of each gird depends on the specimen density specification. Then, for each grid, the parameters are adjusted for local clustering and eventually merging data zones. Experimental results show that this algorithm can help to improve the performance of the DBSCAN algorithm and can deal with arbitrary data and asymmetric data.