通过全局引导伪标记的联合半监督学习：标签稀缺场景的鲁棒方法

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-06-25 DOI:10.1016/j.eswa.2025.128667

Yuan Xi, Qiong Li, Haokun Mao

{"title":"通过全局引导伪标记的联合半监督学习：标签稀缺场景的鲁棒方法","authors":"Yuan Xi, Qiong Li, Haokun Mao","doi":"10.1016/j.eswa.2025.128667","DOIUrl":null,"url":null,"abstract":"<div><div>Federated Semi-Supervised Learning (FSSL) is a powerful paradigm for collaboratively training models on both labeled and unlabeled datasets, which is adopted in domains such as healthcare and IoT. However, heterogeneous data distributions and imbalanced labeling capabilities both lead to significant prediction bias across participating clients, further resulting in skewed pseudo-labels during the local training stage. Most existing FSSL studies address the bias by improving model consistency, which relies on a well-trained benchmark derived from the fully labeled client, and encounters challenges in label-scarce scenarios. In this paper, we propose a novel FSSL method, namely Federated Globally Guided pseudo-labeling (FedGGp), suitable for both label-scarce and Non-Independent and Identically Distributed (Non-IID) scenarios. Specifically, this strategy summarizes the prediction bias assessments based on skewed class predictions, and modifies pseudo-labeling indicators accordingly in the subsequent iteration. For advantageous classes, FedGGp employs adaptive thresholds to generate high-quality pseudo-labels, while for discriminated classes, it expands the number of pseudo-labels to ensure balanced model training. Moreover, soft consistency regularization is applied to broaden the boundary of pseudo-labels for some underrepresented classes, which are typically ambiguous during classifications. The experimental results on four different datasets demonstrate that FedGGp outperforms various state-of-the-art methods.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"294 ","pages":"Article 128667"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Federated semi-supervised learning via globally guided pseudo-labeling: A robust approach for label-scarce scenarios\",\"authors\":\"Yuan Xi, Qiong Li, Haokun Mao\",\"doi\":\"10.1016/j.eswa.2025.128667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Federated Semi-Supervised Learning (FSSL) is a powerful paradigm for collaboratively training models on both labeled and unlabeled datasets, which is adopted in domains such as healthcare and IoT. However, heterogeneous data distributions and imbalanced labeling capabilities both lead to significant prediction bias across participating clients, further resulting in skewed pseudo-labels during the local training stage. Most existing FSSL studies address the bias by improving model consistency, which relies on a well-trained benchmark derived from the fully labeled client, and encounters challenges in label-scarce scenarios. In this paper, we propose a novel FSSL method, namely Federated Globally Guided pseudo-labeling (FedGGp), suitable for both label-scarce and Non-Independent and Identically Distributed (Non-IID) scenarios. Specifically, this strategy summarizes the prediction bias assessments based on skewed class predictions, and modifies pseudo-labeling indicators accordingly in the subsequent iteration. For advantageous classes, FedGGp employs adaptive thresholds to generate high-quality pseudo-labels, while for discriminated classes, it expands the number of pseudo-labels to ensure balanced model training. Moreover, soft consistency regularization is applied to broaden the boundary of pseudo-labels for some underrepresented classes, which are typically ambiguous during classifications. The experimental results on four different datasets demonstrate that FedGGp outperforms various state-of-the-art methods.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"294 \",\"pages\":\"Article 128667\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425022857\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425022857","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

联邦半监督学习（FSSL）是一种强大的范例，用于在标记和未标记数据集上协作训练模型，这被用于医疗保健和物联网等领域。然而，异构的数据分布和不平衡的标注能力都会导致参与客户端的预测偏差显著，进一步导致局部训练阶段的伪标签偏斜。大多数现有的FSSL研究通过提高模型一致性来解决偏差，这依赖于从完全标记的客户端获得的训练有素的基准，并且在标签稀缺的情况下遇到挑战。在本文中，我们提出了一种新的FSSL方法，即联邦全局引导伪标记（federalglobalguided pseudo-labeling, FedGGp），它适用于标签稀缺和非独立和同分布（Non-IID）场景。具体来说，该策略总结了基于偏斜类预测的预测偏差评估，并在后续迭代中相应修改伪标记指标。对于优势类，FedGGp采用自适应阈值生成高质量的伪标签，对于歧视类，FedGGp扩展伪标签的数量，保证模型训练的平衡。此外，对于一些代表性不足的类别，在分类过程中通常是模糊的，应用软一致性正则化来扩大伪标签的边界。在四个不同数据集上的实验结果表明，FedGGp优于各种最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Federated semi-supervised learning via globally guided pseudo-labeling: A robust approach for label-scarce scenarios

Federated Semi-Supervised Learning (FSSL) is a powerful paradigm for collaboratively training models on both labeled and unlabeled datasets, which is adopted in domains such as healthcare and IoT. However, heterogeneous data distributions and imbalanced labeling capabilities both lead to significant prediction bias across participating clients, further resulting in skewed pseudo-labels during the local training stage. Most existing FSSL studies address the bias by improving model consistency, which relies on a well-trained benchmark derived from the fully labeled client, and encounters challenges in label-scarce scenarios. In this paper, we propose a novel FSSL method, namely Federated Globally Guided pseudo-labeling (FedGGp), suitable for both label-scarce and Non-Independent and Identically Distributed (Non-IID) scenarios. Specifically, this strategy summarizes the prediction bias assessments based on skewed class predictions, and modifies pseudo-labeling indicators accordingly in the subsequent iteration. For advantageous classes, FedGGp employs adaptive thresholds to generate high-quality pseudo-labels, while for discriminated classes, it expands the number of pseudo-labels to ensure balanced model training. Moreover, soft consistency regularization is applied to broaden the boundary of pseudo-labels for some underrepresented classes, which are typically ambiguous during classifications. The experimental results on four different datasets demonstrate that FedGGp outperforms various state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.