Density Viewpoint Clustering Outlier Detection Algorithm With Feature Weight and Entropy

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Gezi Shi, Jing Li, Tangbao Zou, Haiyan Yu, Feng Zhao
{"title":"Density Viewpoint Clustering Outlier Detection Algorithm With Feature Weight and Entropy","authors":"Gezi Shi,&nbsp;Jing Li,&nbsp;Tangbao Zou,&nbsp;Haiyan Yu,&nbsp;Feng Zhao","doi":"10.1002/cpe.70086","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The k-means outlier removal (KMOR) algorithm uses the distance criterion to measure similarity in cluster analysis for outlier detection and places outliers in a separate cluster to achieve outlier detection with clustering. However, the distance-based clustering outlier detection algorithm has poor effect and is sensitive to parameters and clustering center for the datasets with a special distribution and large number of outliers. Therefore, this article proposes a density viewpoint clustering outlier detection algorithm with feature weighting and entropy by introducing feature and entropy information. First, the algorithm introduces the entropy regularization into the objective function to control the clustering process by minimizing the clustering dispersion and maximizing the negative entropy. Second, feature weight and regularization strategies are introduced in the objective function and outlier detection criteria to improve the detection accuracy of the algorithm for feature-imbalanced datasets while controlling the weight of features. In addition, the weighted distance function of data dimension normalization is used to calculate the viewpoint, and the correct clustering center is formed by density viewpoint guidance to improve the overall performance. Finally, five experiments by synthetic datasets show that the algorithm has an average classification accuracy of 98.22<span></span><math>\n <semantics>\n <mrow>\n <mo>%</mo>\n </mrow>\n <annotation>$$ \\% $$</annotation>\n </semantics></math>, which is higher than other algorithms. Further demonstrated by ten UCI datasets show that the algorithm can balance data classification and outlier detection well.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 9-11","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70086","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

The k-means outlier removal (KMOR) algorithm uses the distance criterion to measure similarity in cluster analysis for outlier detection and places outliers in a separate cluster to achieve outlier detection with clustering. However, the distance-based clustering outlier detection algorithm has poor effect and is sensitive to parameters and clustering center for the datasets with a special distribution and large number of outliers. Therefore, this article proposes a density viewpoint clustering outlier detection algorithm with feature weighting and entropy by introducing feature and entropy information. First, the algorithm introduces the entropy regularization into the objective function to control the clustering process by minimizing the clustering dispersion and maximizing the negative entropy. Second, feature weight and regularization strategies are introduced in the objective function and outlier detection criteria to improve the detection accuracy of the algorithm for feature-imbalanced datasets while controlling the weight of features. In addition, the weighted distance function of data dimension normalization is used to calculate the viewpoint, and the correct clustering center is formed by density viewpoint guidance to improve the overall performance. Finally, five experiments by synthetic datasets show that the algorithm has an average classification accuracy of 98.22 % $$ \% $$ , which is higher than other algorithms. Further demonstrated by ten UCI datasets show that the algorithm can balance data classification and outlier detection well.

利用特征权重和熵的密度视点聚类异常点检测算法
K-means 离群值去除(KMOR)算法在离群值检测的聚类分析中使用距离准则来衡量相似性,并将离群值置于单独的聚类中,从而实现离群值的聚类检测。然而,基于距离的聚类离群点检测算法效果不佳,对于分布特殊、离群点数量较多的数据集,对参数和聚类中心比较敏感。因此,本文通过引入特征和熵信息,提出了一种具有特征加权和熵的密度视角聚类离群点检测算法。首先,该算法将熵正则化引入目标函数,通过聚类离散度最小化和负熵最大化来控制聚类过程。其次,在目标函数和离群点检测标准中引入特征权重和正则化策略,在控制特征权重的同时,提高算法对特征不均衡数据集的检测精度。此外,利用数据维度归一化的加权距离函数计算视点,通过密度视点引导形成正确的聚类中心,从而提高整体性能。最后,通过合成数据集的五次实验表明,该算法的平均分类准确率为 98.22 % $$ \% $$,高于其他算法。十个 UCI 数据集的进一步证明表明,该算法能很好地兼顾数据分类和离群点检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Concurrency and Computation-Practice & Experience
Concurrency and Computation-Practice & Experience 工程技术-计算机:理论方法
CiteScore
5.00
自引率
10.00%
发文量
664
审稿时长
9.6 months
期刊介绍: Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信