TMsDP: two-stage density peak clustering based on multi-strategy optimization

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Technologies and Applications Pub Date : 2022-08-10 DOI:10.1108/dta-08-2021-0222

Jie Ma, Zhiyuan Hao, Mo Hu

{"title":"TMsDP: two-stage density peak clustering based on multi-strategy optimization","authors":"Jie Ma, Zhiyuan Hao, Mo Hu","doi":"10.1108/dta-08-2021-0222","DOIUrl":null,"url":null,"abstract":"PurposeThe density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and another point with a higher ρ value). According to the center-identifying principle of the DP, the potential cluster centers should have a higher ρ value and a higher δ value than other points. However, this principle may limit the DP from identifying some categories with multi-centers or the centers in lower-density regions. In addition, the improper assignment strategy of the DP could cause a wrong assignment result for the non-center points. This paper aims to address the aforementioned issues and improve the clustering performance of the DP.Design/methodology/approachFirst, to identify as many potential cluster centers as possible, the authors construct a point-domain by introducing the pinhole imaging strategy to extend the searching range of the potential cluster centers. Second, they design different novel calculation methods for calculating the domain distance, point-domain density and domain similarity. Third, they adopt domain similarity to achieve the domain merging process and optimize the final clustering results.FindingsThe experimental results on analyzing 12 synthetic data sets and 12 real-world data sets show that two-stage density peak clustering based on multi-strategy optimization (TMsDP) outperforms the DP and other state-of-the-art algorithms.Originality/valueThe authors propose a novel DP-based clustering method, i.e. TMsDP, and transform the relationship between points into that between domains to ultimately further optimize the clustering performance of the DP.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-08-2021-0222","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

Abstract

PurposeThe density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and another point with a higher ρ value). According to the center-identifying principle of the DP, the potential cluster centers should have a higher ρ value and a higher δ value than other points. However, this principle may limit the DP from identifying some categories with multi-centers or the centers in lower-density regions. In addition, the improper assignment strategy of the DP could cause a wrong assignment result for the non-center points. This paper aims to address the aforementioned issues and improve the clustering performance of the DP.Design/methodology/approachFirst, to identify as many potential cluster centers as possible, the authors construct a point-domain by introducing the pinhole imaging strategy to extend the searching range of the potential cluster centers. Second, they design different novel calculation methods for calculating the domain distance, point-domain density and domain similarity. Third, they adopt domain similarity to achieve the domain merging process and optimize the final clustering results.FindingsThe experimental results on analyzing 12 synthetic data sets and 12 real-world data sets show that two-stage density peak clustering based on multi-strategy optimization (TMsDP) outperforms the DP and other state-of-the-art algorithms.Originality/valueThe authors propose a novel DP-based clustering method, i.e. TMsDP, and transform the relationship between points into that between domains to ultimately further optimize the clustering performance of the DP.

查看原文本刊更多论文

TMsDP：基于多策略优化的两阶段密度峰值聚类

目的提出密度峰值聚类算法(DP)，通过ρ值(局部密度)和δ值(ρ值较大的点与另一个点之间的距离)两个参数来识别聚类中心。根据DP的中心识别原理，潜在簇中心的ρ值和δ值应高于其他点。然而，这一原则可能会限制DP识别一些具有多中心或低密度区域中心的类别。另外，不恰当的DP分配策略可能会导致非中心点的错误分配结果。本文旨在解决上述问题，提高DP的聚类性能。设计/方法/方法首先，为了识别尽可能多的潜在聚类中心，作者引入针孔成像策略构建点域，以扩大潜在聚类中心的搜索范围。其次，他们设计了不同的计算方法来计算域距离、点域密度和域相似度。第三，采用域相似度实现域合并过程，优化最终聚类结果。在12个合成数据集和12个真实数据集上的实验结果表明，基于多策略优化(TMsDP)的两阶段密度峰值聚类优于多策略优化和其他最先进的算法。本文提出了一种新的基于DP的聚类方法TMsDP，并将点与点之间的关系转化为域与域之间的关系，最终进一步优化了DP的聚类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data Technologies and Applications Social Sciences-Library and Information Sciences

CiteScore

3.80

自引率

6.20%

发文量

期刊介绍： Previously published as: Program Online from: 2018 Subject Area: Information & Knowledge Management, Library Studies