基于阴影集模糊特征差加权优化的鲁棒聚类算法

IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS
Xiao Su, Bin Yu, Yunlong Liu
{"title":"基于阴影集模糊特征差加权优化的鲁棒聚类算法","authors":"Xiao Su,&nbsp;Bin Yu,&nbsp;Yunlong Liu","doi":"10.1016/j.fss.2025.109575","DOIUrl":null,"url":null,"abstract":"<div><div>Robustness is a critical metric for evaluating the performance of clustering algorithms, particularly their stability and reliability across various datasets. Despite the significant challenges posed by noise and outliers, robust clustering methods have shown resilience in uncovering inherent data structures. This study expands on foundational work from the turn of the century, advancing robust clustering through strategic innovation. We introduce a novel algorithm—robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets (RWFS). The algorithm implements multi-granular data partitioning based on granular computing to enhance clustering adaptability; it optimizes the handling of boundary samples using shadow set theory, thereby strengthening the identification of fuzzy regions; and it evaluates the purity within clusters through information entropy to optimize the clustering structure. The integration of these three aspects significantly enhances the algorithm's robustness and clustering accuracy in the presence of complex data and noise. Using the modified silhouette coefficient (MSC) as an internal metric, our evaluations confirm RWFS's ability to reveal underlying patterns in datasets affected by various noise types. Under 10% outlier noise, the average MSC values across all benchmark datasets are 0.2275. Under 5% inter cluster noise, the average value of MSC is 0.6037. Under 10% Global Gaussian noise, the average value of MSC is 0.4686. The maximum ACC of robust experiment is 0.8436. This approach enhances robustness and facilitates more nuanced characterization of relationships within clustering algorithms. Our findings highlight RWFS's potential as a powerful tool in machine learning, with broad implications for applications where data integrity and interpretability are crucial. The code is publicly available online at <span><span>https://github.com/yu7bin/RWFS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55130,"journal":{"name":"Fuzzy Sets and Systems","volume":"521 ","pages":"Article 109575"},"PeriodicalIF":2.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets\",\"authors\":\"Xiao Su,&nbsp;Bin Yu,&nbsp;Yunlong Liu\",\"doi\":\"10.1016/j.fss.2025.109575\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Robustness is a critical metric for evaluating the performance of clustering algorithms, particularly their stability and reliability across various datasets. Despite the significant challenges posed by noise and outliers, robust clustering methods have shown resilience in uncovering inherent data structures. This study expands on foundational work from the turn of the century, advancing robust clustering through strategic innovation. We introduce a novel algorithm—robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets (RWFS). The algorithm implements multi-granular data partitioning based on granular computing to enhance clustering adaptability; it optimizes the handling of boundary samples using shadow set theory, thereby strengthening the identification of fuzzy regions; and it evaluates the purity within clusters through information entropy to optimize the clustering structure. The integration of these three aspects significantly enhances the algorithm's robustness and clustering accuracy in the presence of complex data and noise. Using the modified silhouette coefficient (MSC) as an internal metric, our evaluations confirm RWFS's ability to reveal underlying patterns in datasets affected by various noise types. Under 10% outlier noise, the average MSC values across all benchmark datasets are 0.2275. Under 5% inter cluster noise, the average value of MSC is 0.6037. Under 10% Global Gaussian noise, the average value of MSC is 0.4686. The maximum ACC of robust experiment is 0.8436. This approach enhances robustness and facilitates more nuanced characterization of relationships within clustering algorithms. Our findings highlight RWFS's potential as a powerful tool in machine learning, with broad implications for applications where data integrity and interpretability are crucial. The code is publicly available online at <span><span>https://github.com/yu7bin/RWFS</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55130,\"journal\":{\"name\":\"Fuzzy Sets and Systems\",\"volume\":\"521 \",\"pages\":\"Article 109575\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fuzzy Sets and Systems\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0165011425003148\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fuzzy Sets and Systems","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165011425003148","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

鲁棒性是评估聚类算法性能的关键指标,特别是它们在不同数据集上的稳定性和可靠性。尽管噪声和异常值带来了重大挑战,但鲁棒聚类方法在揭示固有数据结构方面显示出弹性。本研究在世纪之交的基础工作的基础上进行了扩展,通过战略创新推进了稳健的集群。提出了一种基于阴影集模糊特征差加权优化的鲁棒聚类算法。该算法实现了基于粒度计算的多粒度数据分区,增强了聚类适应性;利用阴影集理论优化边界样本的处理,从而加强模糊区域的识别;通过信息熵评价聚类内部的纯度,优化聚类结构。这三个方面的结合显著提高了算法在复杂数据和噪声存在下的鲁棒性和聚类精度。使用改进的轮廓系数(MSC)作为内部度量,我们的评估证实了RWFS能够揭示受各种噪声类型影响的数据集中的潜在模式。在10%的异常噪声下,所有基准数据集的平均MSC值为0.2275。在5%簇间噪声条件下,均值为0.6037。在10%全局高斯噪声下,均值为0.4686。稳健性实验的最大ACC为0.8436。这种方法增强了鲁棒性,并有助于更细致地描述聚类算法中的关系。我们的研究结果突出了RWFS作为机器学习强大工具的潜力,对数据完整性和可解释性至关重要的应用程序具有广泛的影响。该代码可在https://github.com/yu7bin/RWFS上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets
Robustness is a critical metric for evaluating the performance of clustering algorithms, particularly their stability and reliability across various datasets. Despite the significant challenges posed by noise and outliers, robust clustering methods have shown resilience in uncovering inherent data structures. This study expands on foundational work from the turn of the century, advancing robust clustering through strategic innovation. We introduce a novel algorithm—robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets (RWFS). The algorithm implements multi-granular data partitioning based on granular computing to enhance clustering adaptability; it optimizes the handling of boundary samples using shadow set theory, thereby strengthening the identification of fuzzy regions; and it evaluates the purity within clusters through information entropy to optimize the clustering structure. The integration of these three aspects significantly enhances the algorithm's robustness and clustering accuracy in the presence of complex data and noise. Using the modified silhouette coefficient (MSC) as an internal metric, our evaluations confirm RWFS's ability to reveal underlying patterns in datasets affected by various noise types. Under 10% outlier noise, the average MSC values across all benchmark datasets are 0.2275. Under 5% inter cluster noise, the average value of MSC is 0.6037. Under 10% Global Gaussian noise, the average value of MSC is 0.4686. The maximum ACC of robust experiment is 0.8436. This approach enhances robustness and facilitates more nuanced characterization of relationships within clustering algorithms. Our findings highlight RWFS's potential as a powerful tool in machine learning, with broad implications for applications where data integrity and interpretability are crucial. The code is publicly available online at https://github.com/yu7bin/RWFS.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Fuzzy Sets and Systems
Fuzzy Sets and Systems 数学-计算机:理论方法
CiteScore
6.50
自引率
17.90%
发文量
321
审稿时长
6.1 months
期刊介绍: Since its launching in 1978, the journal Fuzzy Sets and Systems has been devoted to the international advancement of the theory and application of fuzzy sets and systems. The theory of fuzzy sets now encompasses a well organized corpus of basic notions including (and not restricted to) aggregation operations, a generalized theory of relations, specific measures of information content, a calculus of fuzzy numbers. Fuzzy sets are also the cornerstone of a non-additive uncertainty theory, namely possibility theory, and of a versatile tool for both linguistic and numerical modeling: fuzzy rule-based systems. Numerous works now combine fuzzy concepts with other scientific disciplines as well as modern technologies. In mathematics fuzzy sets have triggered new research topics in connection with category theory, topology, algebra, analysis. Fuzzy sets are also part of a recent trend in the study of generalized measures and integrals, and are combined with statistical methods. Furthermore, fuzzy sets have strong logical underpinnings in the tradition of many-valued logics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信