SAFA: Handling Sparse and Scarce Data in Federated Learning With Accumulative Learning

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2025-02-28 DOI:10.1109/TC.2025.3543682

Nang Hung Nguyen;Truong Thao Nguyen;Trong Nghia Hoang;Hieu H. Pham;Thanh Hung Nguyen;Phi Le Nguyen

{"title":"SAFA: Handling Sparse and Scarce Data in Federated Learning With Accumulative Learning","authors":"Nang Hung Nguyen;Truong Thao Nguyen;Trong Nghia Hoang;Hieu H. Pham;Thanh Hung Nguyen;Phi Le Nguyen","doi":"10.1109/TC.2025.3543682","DOIUrl":null,"url":null,"abstract":"Federated Learning (FL) has emerged as an effective paradigm allowing multiple parties to collaboratively train a global model while protecting their private data. However, it is observed that the performance of FL approaches tends to degrade significantly when data are sparsely distributed across clients with small datasets. This is referred to as the sparse-and-scarce challenge, where data held by each client is both sparse (does not contain examples to all classes) and scarce (small dataset). Sparse-and-scarce data diminishes the generalizability of clients’ data, leading to intensive over-fitting and massive domain shifts in the local models and, ultimately, decreasing the aggregated model's performance. Interestingly, while this scenario is a specific manifestation of the well-known non-IID<xref>1</xref><fn><label>1</label>This refers to the generic situation where local data distributions are not identical and independently distributed.</fn> challenge in FL, it has not been distinctly addressed. Our empirical investigation highlights that generic approaches to the non-IID challenge often prove inadequate in mitigating the sparse-and-scarce issue. To bridge this gap, we develop SAFA, a novel FL algorithm that specifically addresses the sparse-and-scarce challenge via a novel continual model iteration procedure. SAFA maximally exposes local models to the inter-client diversity of data with minimal effects of catastrophic forgetting. Our experiments show that SAFA outperforms existing FL solutions, up to 17.86%, compared to the prominent baseline. The code is accessible via <uri>https://github.com/HungNguyen20/SAFA</uri>.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"1844-1856"},"PeriodicalIF":3.6000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10908581/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Federated Learning (FL) has emerged as an effective paradigm allowing multiple parties to collaboratively train a global model while protecting their private data. However, it is observed that the performance of FL approaches tends to degrade significantly when data are sparsely distributed across clients with small datasets. This is referred to as the sparse-and-scarce challenge, where data held by each client is both sparse (does not contain examples to all classes) and scarce (small dataset). Sparse-and-scarce data diminishes the generalizability of clients’ data, leading to intensive over-fitting and massive domain shifts in the local models and, ultimately, decreasing the aggregated model's performance. Interestingly, while this scenario is a specific manifestation of the well-known non-IID¹¹

This refers to the generic situation where local data distributions are not identical and independently distributed.

challenge in FL, it has not been distinctly addressed. Our empirical investigation highlights that generic approaches to the non-IID challenge often prove inadequate in mitigating the sparse-and-scarce issue. To bridge this gap, we develop SAFA, a novel FL algorithm that specifically addresses the sparse-and-scarce challenge via a novel continual model iteration procedure. SAFA maximally exposes local models to the inter-client diversity of data with minimal effects of catastrophic forgetting. Our experiments show that SAFA outperforms existing FL solutions, up to 17.86%, compared to the prominent baseline. The code is accessible via https://github.com/HungNguyen20/SAFA.

查看原文本刊更多论文

基于累积学习的联邦学习中稀疏和稀缺数据的处理

联邦学习（FL）已经成为一种有效的范例，允许多方协作训练全局模型，同时保护他们的私有数据。然而，我们观察到，当数据稀疏地分布在具有小数据集的客户机上时，FL方法的性能往往会显著降低。这被称为稀疏和稀缺挑战，其中每个客户机持有的数据既稀疏（不包含所有类的示例）又稀缺（小数据集）。稀疏和稀缺数据降低了客户数据的可泛化性，导致局部模型中密集的过拟合和大量的域转移，最终降低了聚合模型的性能。有趣的是，虽然这种场景是众所周知的非iid11的具体表现，但它指的是本地数据分布不相同且独立分布的一般情况。在佛罗里达州的挑战，它还没有得到明确的解决。我们的实证调查强调，非iid挑战的通用方法往往被证明不足以缓解稀疏和稀缺问题。为了弥补这一差距，我们开发了SAFA，这是一种新的FL算法，通过一种新的连续模型迭代过程专门解决了稀疏和稀缺的挑战。SAFA最大限度地将本地模型暴露于客户间数据的多样性中，而灾难性遗忘的影响最小。我们的实验表明，与突出的基线相比，SAFA优于现有的FL解决方案，高达17.86%。代码可通过https://github.com/HungNguyen20/SAFA访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.