{"title":"基于阴影集模糊特征差加权优化的鲁棒聚类算法","authors":"Xiao Su, Bin Yu, Yunlong Liu","doi":"10.1016/j.fss.2025.109575","DOIUrl":null,"url":null,"abstract":"<div><div>Robustness is a critical metric for evaluating the performance of clustering algorithms, particularly their stability and reliability across various datasets. Despite the significant challenges posed by noise and outliers, robust clustering methods have shown resilience in uncovering inherent data structures. This study expands on foundational work from the turn of the century, advancing robust clustering through strategic innovation. We introduce a novel algorithm—robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets (RWFS). The algorithm implements multi-granular data partitioning based on granular computing to enhance clustering adaptability; it optimizes the handling of boundary samples using shadow set theory, thereby strengthening the identification of fuzzy regions; and it evaluates the purity within clusters through information entropy to optimize the clustering structure. The integration of these three aspects significantly enhances the algorithm's robustness and clustering accuracy in the presence of complex data and noise. Using the modified silhouette coefficient (MSC) as an internal metric, our evaluations confirm RWFS's ability to reveal underlying patterns in datasets affected by various noise types. Under 10% outlier noise, the average MSC values across all benchmark datasets are 0.2275. Under 5% inter cluster noise, the average value of MSC is 0.6037. Under 10% Global Gaussian noise, the average value of MSC is 0.4686. The maximum ACC of robust experiment is 0.8436. This approach enhances robustness and facilitates more nuanced characterization of relationships within clustering algorithms. Our findings highlight RWFS's potential as a powerful tool in machine learning, with broad implications for applications where data integrity and interpretability are crucial. The code is publicly available online at <span><span>https://github.com/yu7bin/RWFS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55130,"journal":{"name":"Fuzzy Sets and Systems","volume":"521 ","pages":"Article 109575"},"PeriodicalIF":2.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets\",\"authors\":\"Xiao Su, Bin Yu, Yunlong Liu\",\"doi\":\"10.1016/j.fss.2025.109575\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Robustness is a critical metric for evaluating the performance of clustering algorithms, particularly their stability and reliability across various datasets. Despite the significant challenges posed by noise and outliers, robust clustering methods have shown resilience in uncovering inherent data structures. This study expands on foundational work from the turn of the century, advancing robust clustering through strategic innovation. We introduce a novel algorithm—robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets (RWFS). The algorithm implements multi-granular data partitioning based on granular computing to enhance clustering adaptability; it optimizes the handling of boundary samples using shadow set theory, thereby strengthening the identification of fuzzy regions; and it evaluates the purity within clusters through information entropy to optimize the clustering structure. The integration of these three aspects significantly enhances the algorithm's robustness and clustering accuracy in the presence of complex data and noise. Using the modified silhouette coefficient (MSC) as an internal metric, our evaluations confirm RWFS's ability to reveal underlying patterns in datasets affected by various noise types. Under 10% outlier noise, the average MSC values across all benchmark datasets are 0.2275. Under 5% inter cluster noise, the average value of MSC is 0.6037. Under 10% Global Gaussian noise, the average value of MSC is 0.4686. The maximum ACC of robust experiment is 0.8436. This approach enhances robustness and facilitates more nuanced characterization of relationships within clustering algorithms. Our findings highlight RWFS's potential as a powerful tool in machine learning, with broad implications for applications where data integrity and interpretability are crucial. The code is publicly available online at <span><span>https://github.com/yu7bin/RWFS</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55130,\"journal\":{\"name\":\"Fuzzy Sets and Systems\",\"volume\":\"521 \",\"pages\":\"Article 109575\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fuzzy Sets and Systems\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0165011425003148\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fuzzy Sets and Systems","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165011425003148","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
A robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets
Robustness is a critical metric for evaluating the performance of clustering algorithms, particularly their stability and reliability across various datasets. Despite the significant challenges posed by noise and outliers, robust clustering methods have shown resilience in uncovering inherent data structures. This study expands on foundational work from the turn of the century, advancing robust clustering through strategic innovation. We introduce a novel algorithm—robust clustering algorithm based on weighted optimization of fuzzy feature difference using shadow sets (RWFS). The algorithm implements multi-granular data partitioning based on granular computing to enhance clustering adaptability; it optimizes the handling of boundary samples using shadow set theory, thereby strengthening the identification of fuzzy regions; and it evaluates the purity within clusters through information entropy to optimize the clustering structure. The integration of these three aspects significantly enhances the algorithm's robustness and clustering accuracy in the presence of complex data and noise. Using the modified silhouette coefficient (MSC) as an internal metric, our evaluations confirm RWFS's ability to reveal underlying patterns in datasets affected by various noise types. Under 10% outlier noise, the average MSC values across all benchmark datasets are 0.2275. Under 5% inter cluster noise, the average value of MSC is 0.6037. Under 10% Global Gaussian noise, the average value of MSC is 0.4686. The maximum ACC of robust experiment is 0.8436. This approach enhances robustness and facilitates more nuanced characterization of relationships within clustering algorithms. Our findings highlight RWFS's potential as a powerful tool in machine learning, with broad implications for applications where data integrity and interpretability are crucial. The code is publicly available online at https://github.com/yu7bin/RWFS.
期刊介绍:
Since its launching in 1978, the journal Fuzzy Sets and Systems has been devoted to the international advancement of the theory and application of fuzzy sets and systems. The theory of fuzzy sets now encompasses a well organized corpus of basic notions including (and not restricted to) aggregation operations, a generalized theory of relations, specific measures of information content, a calculus of fuzzy numbers. Fuzzy sets are also the cornerstone of a non-additive uncertainty theory, namely possibility theory, and of a versatile tool for both linguistic and numerical modeling: fuzzy rule-based systems. Numerous works now combine fuzzy concepts with other scientific disciplines as well as modern technologies.
In mathematics fuzzy sets have triggered new research topics in connection with category theory, topology, algebra, analysis. Fuzzy sets are also part of a recent trend in the study of generalized measures and integrals, and are combined with statistical methods. Furthermore, fuzzy sets have strong logical underpinnings in the tradition of many-valued logics.