Zexuan Fei , Yan Ma , Jinfeng Zhao , Bin Wang , Jie Yang
{"title":"KNEG-CL: Unveiling data patterns using a k-nearest neighbor evolutionary graph for efficient clustering","authors":"Zexuan Fei , Yan Ma , Jinfeng Zhao , Bin Wang , Jie Yang","doi":"10.1016/j.ins.2024.121602","DOIUrl":null,"url":null,"abstract":"<div><div>Existing clustering algorithms often struggle to handle datasets that are diverse and complex, largely due to a dependency on Euclidean distance. Accurately quantifying distances between data points also poses challenges in practical scenarios. Additionally, the curse of dimensionality in high-dimensional datasets also impacts the performance of clustering algorithms. This paper introduces an innovative approach to overcome these challenges: the <em>k</em>-nearest neighbor evolution graph (kNEG), an unweighted directed graph that evolves by incrementally adding directed edges as the value of <em>k</em> increases. This design captures intricate details such as data point density and the direction of density variation. We present kNEG-CL, a clustering algorithm derived from kNEG, which leverages vertex degree and edge directionality to intuitively cluster data. kNEG-CL is guided by two principles: using vertex degrees to identify density peaks, and assessing a balance of outgoing and incoming edges for subcluster merging. By identifying density peaks and utilizing a density-boosting search for initial partitioning, followed by a two-stage merging process, our algorithm achieves high clustering accuracy. Extensive testing across varied datasets demonstrates the superior performance of kNEG-CL, particularly in handling large-scale and high-dimensional data, highlighting its effectiveness in clustering accuracy and computational efficiency.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"690 ","pages":"Article 121602"},"PeriodicalIF":8.1000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524015160","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Existing clustering algorithms often struggle to handle datasets that are diverse and complex, largely due to a dependency on Euclidean distance. Accurately quantifying distances between data points also poses challenges in practical scenarios. Additionally, the curse of dimensionality in high-dimensional datasets also impacts the performance of clustering algorithms. This paper introduces an innovative approach to overcome these challenges: the k-nearest neighbor evolution graph (kNEG), an unweighted directed graph that evolves by incrementally adding directed edges as the value of k increases. This design captures intricate details such as data point density and the direction of density variation. We present kNEG-CL, a clustering algorithm derived from kNEG, which leverages vertex degree and edge directionality to intuitively cluster data. kNEG-CL is guided by two principles: using vertex degrees to identify density peaks, and assessing a balance of outgoing and incoming edges for subcluster merging. By identifying density peaks and utilizing a density-boosting search for initial partitioning, followed by a two-stage merging process, our algorithm achieves high clustering accuracy. Extensive testing across varied datasets demonstrates the superior performance of kNEG-CL, particularly in handling large-scale and high-dimensional data, highlighting its effectiveness in clustering accuracy and computational efficiency.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.