KNEG-CL: Unveiling data patterns using a k-nearest neighbor evolutionary graph for efficient clustering

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2024-10-30 DOI:10.1016/j.ins.2024.121602

Zexuan Fei , Yan Ma , Jinfeng Zhao , Bin Wang , Jie Yang

{"title":"KNEG-CL: Unveiling data patterns using a k-nearest neighbor evolutionary graph for efficient clustering","authors":"Zexuan Fei , Yan Ma , Jinfeng Zhao , Bin Wang , Jie Yang","doi":"10.1016/j.ins.2024.121602","DOIUrl":null,"url":null,"abstract":"<div><div>Existing clustering algorithms often struggle to handle datasets that are diverse and complex, largely due to a dependency on Euclidean distance. Accurately quantifying distances between data points also poses challenges in practical scenarios. Additionally, the curse of dimensionality in high-dimensional datasets also impacts the performance of clustering algorithms. This paper introduces an innovative approach to overcome these challenges: the <em>k</em>-nearest neighbor evolution graph (kNEG), an unweighted directed graph that evolves by incrementally adding directed edges as the value of <em>k</em> increases. This design captures intricate details such as data point density and the direction of density variation. We present kNEG-CL, a clustering algorithm derived from kNEG, which leverages vertex degree and edge directionality to intuitively cluster data. kNEG-CL is guided by two principles: using vertex degrees to identify density peaks, and assessing a balance of outgoing and incoming edges for subcluster merging. By identifying density peaks and utilizing a density-boosting search for initial partitioning, followed by a two-stage merging process, our algorithm achieves high clustering accuracy. Extensive testing across varied datasets demonstrates the superior performance of kNEG-CL, particularly in handling large-scale and high-dimensional data, highlighting its effectiveness in clustering accuracy and computational efficiency.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"690 ","pages":"Article 121602"},"PeriodicalIF":8.1000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524015160","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Existing clustering algorithms often struggle to handle datasets that are diverse and complex, largely due to a dependency on Euclidean distance. Accurately quantifying distances between data points also poses challenges in practical scenarios. Additionally, the curse of dimensionality in high-dimensional datasets also impacts the performance of clustering algorithms. This paper introduces an innovative approach to overcome these challenges: the k-nearest neighbor evolution graph (kNEG), an unweighted directed graph that evolves by incrementally adding directed edges as the value of k increases. This design captures intricate details such as data point density and the direction of density variation. We present kNEG-CL, a clustering algorithm derived from kNEG, which leverages vertex degree and edge directionality to intuitively cluster data. kNEG-CL is guided by two principles: using vertex degrees to identify density peaks, and assessing a balance of outgoing and incoming edges for subcluster merging. By identifying density peaks and utilizing a density-boosting search for initial partitioning, followed by a two-stage merging process, our algorithm achieves high clustering accuracy. Extensive testing across varied datasets demonstrates the superior performance of kNEG-CL, particularly in handling large-scale and high-dimensional data, highlighting its effectiveness in clustering accuracy and computational efficiency.

查看原文本刊更多论文

KNEG-CL：利用 k 近邻进化图揭示数据模式，实现高效聚类

现有的聚类算法往往难以处理复杂多样的数据集，这主要是由于对欧氏距离的依赖。在实际应用中，准确量化数据点之间的距离也是一个挑战。此外，高维数据集的维度诅咒也会影响聚类算法的性能。本文介绍了一种克服这些挑战的创新方法：k-近邻演化图（kNEG），这是一种非加权有向图，随着 k 值的增加，通过逐步增加有向边来演化。这种设计可以捕捉数据点密度和密度变化方向等复杂细节。我们提出的 kNEG-CL 是一种从 kNEG 派生的聚类算法，它利用顶点度和边的方向性对数据进行直观聚类。kNEG-CL 遵循两个原则：利用顶点度识别密度峰值，以及评估出边和入边的平衡以进行子聚类合并。通过识别密度峰值，利用密度提升搜索进行初始划分，然后进行两阶段合并，我们的算法实现了很高的聚类精度。对各种数据集的广泛测试表明，kNEG-CL 性能优越，尤其是在处理大规模和高维数据时，突出了其聚类精度和计算效率的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.