Density Viewpoint Clustering Outlier Detection Algorithm With Feature Weight and Entropy

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2025-04-12 DOI:10.1002/cpe.70086

Gezi Shi, Jing Li, Tangbao Zou, Haiyan Yu, Feng Zhao

{"title":"Density Viewpoint Clustering Outlier Detection Algorithm With Feature Weight and Entropy","authors":"Gezi Shi, Jing Li, Tangbao Zou, Haiyan Yu, Feng Zhao","doi":"10.1002/cpe.70086","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The k-means outlier removal (KMOR) algorithm uses the distance criterion to measure similarity in cluster analysis for outlier detection and places outliers in a separate cluster to achieve outlier detection with clustering. However, the distance-based clustering outlier detection algorithm has poor effect and is sensitive to parameters and clustering center for the datasets with a special distribution and large number of outliers. Therefore, this article proposes a density viewpoint clustering outlier detection algorithm with feature weighting and entropy by introducing feature and entropy information. First, the algorithm introduces the entropy regularization into the objective function to control the clustering process by minimizing the clustering dispersion and maximizing the negative entropy. Second, feature weight and regularization strategies are introduced in the objective function and outlier detection criteria to improve the detection accuracy of the algorithm for feature-imbalanced datasets while controlling the weight of features. In addition, the weighted distance function of data dimension normalization is used to calculate the viewpoint, and the correct clustering center is formed by density viewpoint guidance to improve the overall performance. Finally, five experiments by synthetic datasets show that the algorithm has an average classification accuracy of 98.22<span></span><math>\n <semantics>\n <mrow>\n <mo>%</mo>\n </mrow>\n <annotation>$$ \\% $$</annotation>\n </semantics></math>, which is higher than other algorithms. Further demonstrated by ten UCI datasets show that the algorithm can balance data classification and outlier detection well.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 9-11","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70086","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

The k-means outlier removal (KMOR) algorithm uses the distance criterion to measure similarity in cluster analysis for outlier detection and places outliers in a separate cluster to achieve outlier detection with clustering. However, the distance-based clustering outlier detection algorithm has poor effect and is sensitive to parameters and clustering center for the datasets with a special distribution and large number of outliers. Therefore, this article proposes a density viewpoint clustering outlier detection algorithm with feature weighting and entropy by introducing feature and entropy information. First, the algorithm introduces the entropy regularization into the objective function to control the clustering process by minimizing the clustering dispersion and maximizing the negative entropy. Second, feature weight and regularization strategies are introduced in the objective function and outlier detection criteria to improve the detection accuracy of the algorithm for feature-imbalanced datasets while controlling the weight of features. In addition, the weighted distance function of data dimension normalization is used to calculate the viewpoint, and the correct clustering center is formed by density viewpoint guidance to improve the overall performance. Finally, five experiments by synthetic datasets show that the algorithm has an average classification accuracy of 98.22 $%$ , which is higher than other algorithms. Further demonstrated by ten UCI datasets show that the algorithm can balance data classification and outlier detection well.

查看原文本刊更多论文

利用特征权重和熵的密度视点聚类异常点检测算法

K-means 离群值去除（KMOR）算法在离群值检测的聚类分析中使用距离准则来衡量相似性，并将离群值置于单独的聚类中，从而实现离群值的聚类检测。然而，基于距离的聚类离群点检测算法效果不佳，对于分布特殊、离群点数量较多的数据集，对参数和聚类中心比较敏感。因此，本文通过引入特征和熵信息，提出了一种具有特征加权和熵的密度视角聚类离群点检测算法。首先，该算法将熵正则化引入目标函数，通过聚类离散度最小化和负熵最大化来控制聚类过程。其次，在目标函数和离群点检测标准中引入特征权重和正则化策略，在控制特征权重的同时，提高算法对特征不均衡数据集的检测精度。此外，利用数据维度归一化的加权距离函数计算视点，通过密度视点引导形成正确的聚类中心，从而提高整体性能。最后，通过合成数据集的五次实验表明，该算法的平均分类准确率为 98.22 % $$ \% $$，高于其他算法。十个 UCI 数据集的进一步证明表明，该算法能很好地兼顾数据分类和离群点检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.