{"title":"Density Viewpoint Clustering Outlier Detection Algorithm With Feature Weight and Entropy","authors":"Gezi Shi, Jing Li, Tangbao Zou, Haiyan Yu, Feng Zhao","doi":"10.1002/cpe.70086","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The k-means outlier removal (KMOR) algorithm uses the distance criterion to measure similarity in cluster analysis for outlier detection and places outliers in a separate cluster to achieve outlier detection with clustering. However, the distance-based clustering outlier detection algorithm has poor effect and is sensitive to parameters and clustering center for the datasets with a special distribution and large number of outliers. Therefore, this article proposes a density viewpoint clustering outlier detection algorithm with feature weighting and entropy by introducing feature and entropy information. First, the algorithm introduces the entropy regularization into the objective function to control the clustering process by minimizing the clustering dispersion and maximizing the negative entropy. Second, feature weight and regularization strategies are introduced in the objective function and outlier detection criteria to improve the detection accuracy of the algorithm for feature-imbalanced datasets while controlling the weight of features. In addition, the weighted distance function of data dimension normalization is used to calculate the viewpoint, and the correct clustering center is formed by density viewpoint guidance to improve the overall performance. Finally, five experiments by synthetic datasets show that the algorithm has an average classification accuracy of 98.22<span></span><math>\n <semantics>\n <mrow>\n <mo>%</mo>\n </mrow>\n <annotation>$$ \\% $$</annotation>\n </semantics></math>, which is higher than other algorithms. Further demonstrated by ten UCI datasets show that the algorithm can balance data classification and outlier detection well.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 9-11","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70086","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
The k-means outlier removal (KMOR) algorithm uses the distance criterion to measure similarity in cluster analysis for outlier detection and places outliers in a separate cluster to achieve outlier detection with clustering. However, the distance-based clustering outlier detection algorithm has poor effect and is sensitive to parameters and clustering center for the datasets with a special distribution and large number of outliers. Therefore, this article proposes a density viewpoint clustering outlier detection algorithm with feature weighting and entropy by introducing feature and entropy information. First, the algorithm introduces the entropy regularization into the objective function to control the clustering process by minimizing the clustering dispersion and maximizing the negative entropy. Second, feature weight and regularization strategies are introduced in the objective function and outlier detection criteria to improve the detection accuracy of the algorithm for feature-imbalanced datasets while controlling the weight of features. In addition, the weighted distance function of data dimension normalization is used to calculate the viewpoint, and the correct clustering center is formed by density viewpoint guidance to improve the overall performance. Finally, five experiments by synthetic datasets show that the algorithm has an average classification accuracy of 98.22, which is higher than other algorithms. Further demonstrated by ten UCI datasets show that the algorithm can balance data classification and outlier detection well.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.