Improved safe semi-supervised clustering based on capped ℓ21 norm

IF 3.2 1区数学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Fuzzy Sets and Systems Pub Date : 2025-01-10 DOI:10.1016/j.fss.2025.109276

Haitao Gan, Zhi Yang, Ming Shi, Zhiwei Ye, Ran Zhou

{"title":"Improved safe semi-supervised clustering based on capped ℓ21 norm","authors":"Haitao Gan, Zhi Yang, Ming Shi, Zhiwei Ye, Ran Zhou","doi":"10.1016/j.fss.2025.109276","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, the concept of safe semi-supervised clustering (S3C) has received increasing attention within the semi-supervised learning community. Generally, existing S3C methods first analyze the risk of labeled instances and then try to mitigate the corresponding negative impacts through various risk-based regularization approaches. However, the adverse effects of high-probability mislabeled instances (HPMIs) are not eliminated, and corresponding useful discriminative information is not discovered effectively. To address these issues, we propose an improved S3C method based on capped <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>21</mn></mrow></msub></math></span> norm, called CapS3FCM. The motivation is that the capped <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>21</mn></mrow></msub></math></span> norm can effectively filter or find mislabeled instances. Consequently, CapS3FCM introduces two capped <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>21</mn></mrow></msub></math></span> norms. The first norm aims to make use of label information while simultaneously alleviating negative influences of mislabeled instances, especially HPMIs. The second norm further aims to discover useful discriminative information of those HPMIs. Finally, a loss function based on the capped <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>21</mn></mrow></msub></math></span> norms is built, and the optimization problem is solved using an efficient iterative optimization strategy. To verify the effectiveness of CapS3FCM, a series of experiments is carried out on several datasets, which demonstrate that CapS3FCM can outperform the other semi-supervised and S3C methods. These findings validate that the capped <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>21</mn></mrow></msub></math></span> norm is both practical and effective.</div></div>","PeriodicalId":55130,"journal":{"name":"Fuzzy Sets and Systems","volume":"505 ","pages":"Article 109276"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fuzzy Sets and Systems","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165011425000156","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, the concept of safe semi-supervised clustering (S3C) has received increasing attention within the semi-supervised learning community. Generally, existing S3C methods first analyze the risk of labeled instances and then try to mitigate the corresponding negative impacts through various risk-based regularization approaches. However, the adverse effects of high-probability mislabeled instances (HPMIs) are not eliminated, and corresponding useful discriminative information is not discovered effectively. To address these issues, we propose an improved S3C method based on capped

ℓ_{21}

norm, called CapS3FCM. The motivation is that the capped

ℓ_{21}

norm can effectively filter or find mislabeled instances. Consequently, CapS3FCM introduces two capped

ℓ_{21}

norms. The first norm aims to make use of label information while simultaneously alleviating negative influences of mislabeled instances, especially HPMIs. The second norm further aims to discover useful discriminative information of those HPMIs. Finally, a loss function based on the capped

ℓ_{21}

norms is built, and the optimization problem is solved using an efficient iterative optimization strategy. To verify the effectiveness of CapS3FCM, a series of experiments is carried out on several datasets, which demonstrate that CapS3FCM can outperform the other semi-supervised and S3C methods. These findings validate that the capped

ℓ_{21}

norm is both practical and effective.

查看原文本刊更多论文

基于上限l_21范数的改进安全半监督聚类

近年来，安全半监督聚类（S3C）的概念在半监督学习领域受到越来越多的关注。通常，现有的S3C方法首先分析标记实例的风险，然后尝试通过各种基于风险的正则化方法来减轻相应的负面影响。然而，高概率错标实例（high-probability mislabeled instance, hpmi）的不利影响并没有消除，也没有有效地发现相应的有用的判别信息。为了解决这些问题，我们提出了一种基于上限l21范数的改进S3C方法，称为CapS3FCM。这样做的动机是，上限的l21范数可以有效地过滤或发现错误标记的实例。因此，CapS3FCM引入了两个上限的l21范数。第一个规范旨在利用标签信息，同时减轻错误标记实例，特别是hpmi的负面影响。第二个规范进一步旨在发现这些hpmi的有用判别信息。最后，建立了一个基于上限l21范数的损失函数，并采用一种高效的迭代优化策略求解了优化问题。为了验证CapS3FCM的有效性，在多个数据集上进行了一系列实验，结果表明CapS3FCM优于其他半监督和S3C方法。这些结果验证了上限规范是实用和有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Fuzzy Sets and Systems 数学-计算机：理论方法

CiteScore

6.50

自引率

17.90%

发文量

321

审稿时长

6.1 months

期刊介绍： Since its launching in 1978, the journal Fuzzy Sets and Systems has been devoted to the international advancement of the theory and application of fuzzy sets and systems. The theory of fuzzy sets now encompasses a well organized corpus of basic notions including (and not restricted to) aggregation operations, a generalized theory of relations, specific measures of information content, a calculus of fuzzy numbers. Fuzzy sets are also the cornerstone of a non-additive uncertainty theory, namely possibility theory, and of a versatile tool for both linguistic and numerical modeling: fuzzy rule-based systems. Numerous works now combine fuzzy concepts with other scientific disciplines as well as modern technologies. In mathematics fuzzy sets have triggered new research topics in connection with category theory, topology, algebra, analysis. Fuzzy sets are also part of a recent trend in the study of generalized measures and integrals, and are combined with statistical methods. Furthermore, fuzzy sets have strong logical underpinnings in the tradition of many-valued logics.