Dynamic class-balanced threshold Federated Semi-Supervised Learning by exploring diffusion model and all unlabeled data

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-03-22 DOI:10.1016/j.future.2025.107820

Zeyuan Wang , Yang Liu , Guirong Liang , Cheng Zhong , Feng Yang

{"title":"Dynamic class-balanced threshold Federated Semi-Supervised Learning by exploring diffusion model and all unlabeled data","authors":"Zeyuan Wang , Yang Liu , Guirong Liang , Cheng Zhong , Feng Yang","doi":"10.1016/j.future.2025.107820","DOIUrl":null,"url":null,"abstract":"<div><div>Federated Semi-Supervised Learning (FSSL) aims to train models based on federated learning using a small amount of labeled data and a large amount of unlabeled data. The limited labeled data and the issue of non-independent and identically distributed (non-IID) data are the major challenges faced by FSSL. Most of the previous methods use traditional fixed thresholds to filter out high-confidence samples and assign pseudo-labels to them without considering low-confidence samples. These methods then increase the sample space by random sampling and other techniques to address the challenges of FSSL. However, the performance of these models remains unsatisfactory. To tackle these challenges, we propose DDRFed, a novel FSSL framework that effectively utilizes all available data by integrating a diffusion model and dynamic class balance thresholds. Specifically, we first mitigate the client-side non-IID issue by utilizing a dataset generated by a client-side co-trained diffusion model that conforms to the global data distribution. The local clients then use the global class distribution information provided by the server to establish dynamic class balance thresholds, which distinguish between high-confidence and low-confidence samples. The existence of dynamic thresholds ensures a sufficient amount of labeled data during the training process. Meanwhile, to fully leverage the knowledge contained in low-confidence samples, we optimize the model’s performance through residual class negative learning. Experiments conducted on two natural datasets demonstrate the superiority of DDRFed, addressing both major challenges in FSSL.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"170 ","pages":"Article 107820"},"PeriodicalIF":6.2000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25001153","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Federated Semi-Supervised Learning (FSSL) aims to train models based on federated learning using a small amount of labeled data and a large amount of unlabeled data. The limited labeled data and the issue of non-independent and identically distributed (non-IID) data are the major challenges faced by FSSL. Most of the previous methods use traditional fixed thresholds to filter out high-confidence samples and assign pseudo-labels to them without considering low-confidence samples. These methods then increase the sample space by random sampling and other techniques to address the challenges of FSSL. However, the performance of these models remains unsatisfactory. To tackle these challenges, we propose DDRFed, a novel FSSL framework that effectively utilizes all available data by integrating a diffusion model and dynamic class balance thresholds. Specifically, we first mitigate the client-side non-IID issue by utilizing a dataset generated by a client-side co-trained diffusion model that conforms to the global data distribution. The local clients then use the global class distribution information provided by the server to establish dynamic class balance thresholds, which distinguish between high-confidence and low-confidence samples. The existence of dynamic thresholds ensures a sufficient amount of labeled data during the training process. Meanwhile, to fully leverage the knowledge contained in low-confidence samples, we optimize the model’s performance through residual class negative learning. Experiments conducted on two natural datasets demonstrate the superiority of DDRFed, addressing both major challenges in FSSL.

查看原文本刊更多论文

通过探索扩散模型和所有未标记数据的动态类平衡阈值联合半监督学习

联邦半监督学习（FSSL）旨在利用少量的标记数据和大量的未标记数据来训练基于联邦学习的模型。有限的标记数据和非独立和同分布（non-IID）数据问题是FSSL面临的主要挑战。以往的方法大多采用传统的固定阈值过滤掉高置信度样本，并对其进行伪标签处理，而不考虑低置信度样本。然后，这些方法通过随机抽样和其他技术来增加样本空间，以解决FSSL的挑战。然而，这些模型的性能仍然令人不满意。为了应对这些挑战，我们提出了一种新的FSSL框架DDRFed，它通过集成扩散模型和动态类平衡阈值有效地利用了所有可用数据。具体来说，我们首先通过使用符合全球数据分布的客户端共同训练扩散模型生成的数据集来缓解客户端非iid问题。然后，本地客户端使用服务器提供的全局类分布信息来建立动态类平衡阈值，以区分高置信度和低置信度样本。动态阈值的存在保证了训练过程中有足够数量的标记数据。同时，为了充分利用低置信度样本中包含的知识，我们通过残差类负学习来优化模型的性能。在两个自然数据集上进行的实验证明了DDRFed的优势，解决了FSSL中的两个主要挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.