Khurram Khan;Atiq ur Rehman;Adnan Khan;Syed Rameez Naqvi;Samir Brahim Belhaouari;Amine Bermak
{"title":"A Nonparametric Split and Kernel-Merge Clustering Algorithm","authors":"Khurram Khan;Atiq ur Rehman;Adnan Khan;Syed Rameez Naqvi;Samir Brahim Belhaouari;Amine Bermak","doi":"10.1109/TAI.2024.3382248","DOIUrl":null,"url":null,"abstract":"This work proposes a novel split and kernel-merge clustering (S-KMC), a nonparametric clustering algorithm that combines the strengths of hierarchical clustering, partitional clustering, and density-based clustering. It consists of two main phases: splitting and merging. In the splitting phase, a ranking-based operator is used to divide the data into optimal subclusters. In the merging phase, a kernel function estimates the density of these subclusters after projecting them onto a straight line passing through their centers, facilitating the merging operation. S-KMC is fully nonparametric, eliminating the need for prior information about the data. It effectively handles 1) shape diversity, 2) density variability, 3) high dimensionality, 4) outliers, and 5) missing values. The algorithm offers easily tunable hyperparameters, enhancing its applicability to complex problems and robustness against data anomalies. Experimental analysis on 21 benchmark datasets demonstrates the improved performance of S-KMC in terms of cluster accuracy, handling high-dimensional data, and managing data anomalies and outliers. Comprehensive comparisons with state-of-the-art techniques further validate the superior or comparable performance of the proposed S-KMC algorithm.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 9","pages":"4443-4457"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10480882/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This work proposes a novel split and kernel-merge clustering (S-KMC), a nonparametric clustering algorithm that combines the strengths of hierarchical clustering, partitional clustering, and density-based clustering. It consists of two main phases: splitting and merging. In the splitting phase, a ranking-based operator is used to divide the data into optimal subclusters. In the merging phase, a kernel function estimates the density of these subclusters after projecting them onto a straight line passing through their centers, facilitating the merging operation. S-KMC is fully nonparametric, eliminating the need for prior information about the data. It effectively handles 1) shape diversity, 2) density variability, 3) high dimensionality, 4) outliers, and 5) missing values. The algorithm offers easily tunable hyperparameters, enhancing its applicability to complex problems and robustness against data anomalies. Experimental analysis on 21 benchmark datasets demonstrates the improved performance of S-KMC in terms of cluster accuracy, handling high-dimensional data, and managing data anomalies and outliers. Comprehensive comparisons with state-of-the-art techniques further validate the superior or comparable performance of the proposed S-KMC algorithm.