Zeyuan Wang , Yang Liu , Guirong Liang , Cheng Zhong , Feng Yang
{"title":"Dynamic class-balanced threshold Federated Semi-Supervised Learning by exploring diffusion model and all unlabeled data","authors":"Zeyuan Wang , Yang Liu , Guirong Liang , Cheng Zhong , Feng Yang","doi":"10.1016/j.future.2025.107820","DOIUrl":null,"url":null,"abstract":"<div><div>Federated Semi-Supervised Learning (FSSL) aims to train models based on federated learning using a small amount of labeled data and a large amount of unlabeled data. The limited labeled data and the issue of non-independent and identically distributed (non-IID) data are the major challenges faced by FSSL. Most of the previous methods use traditional fixed thresholds to filter out high-confidence samples and assign pseudo-labels to them without considering low-confidence samples. These methods then increase the sample space by random sampling and other techniques to address the challenges of FSSL. However, the performance of these models remains unsatisfactory. To tackle these challenges, we propose DDRFed, a novel FSSL framework that effectively utilizes all available data by integrating a diffusion model and dynamic class balance thresholds. Specifically, we first mitigate the client-side non-IID issue by utilizing a dataset generated by a client-side co-trained diffusion model that conforms to the global data distribution. The local clients then use the global class distribution information provided by the server to establish dynamic class balance thresholds, which distinguish between high-confidence and low-confidence samples. The existence of dynamic thresholds ensures a sufficient amount of labeled data during the training process. Meanwhile, to fully leverage the knowledge contained in low-confidence samples, we optimize the model’s performance through residual class negative learning. Experiments conducted on two natural datasets demonstrate the superiority of DDRFed, addressing both major challenges in FSSL.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"170 ","pages":"Article 107820"},"PeriodicalIF":6.2000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25001153","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Federated Semi-Supervised Learning (FSSL) aims to train models based on federated learning using a small amount of labeled data and a large amount of unlabeled data. The limited labeled data and the issue of non-independent and identically distributed (non-IID) data are the major challenges faced by FSSL. Most of the previous methods use traditional fixed thresholds to filter out high-confidence samples and assign pseudo-labels to them without considering low-confidence samples. These methods then increase the sample space by random sampling and other techniques to address the challenges of FSSL. However, the performance of these models remains unsatisfactory. To tackle these challenges, we propose DDRFed, a novel FSSL framework that effectively utilizes all available data by integrating a diffusion model and dynamic class balance thresholds. Specifically, we first mitigate the client-side non-IID issue by utilizing a dataset generated by a client-side co-trained diffusion model that conforms to the global data distribution. The local clients then use the global class distribution information provided by the server to establish dynamic class balance thresholds, which distinguish between high-confidence and low-confidence samples. The existence of dynamic thresholds ensures a sufficient amount of labeled data during the training process. Meanwhile, to fully leverage the knowledge contained in low-confidence samples, we optimize the model’s performance through residual class negative learning. Experiments conducted on two natural datasets demonstrate the superiority of DDRFed, addressing both major challenges in FSSL.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.