Nang Hung Nguyen;Truong Thao Nguyen;Trong Nghia Hoang;Hieu H. Pham;Thanh Hung Nguyen;Phi Le Nguyen
{"title":"SAFA: Handling Sparse and Scarce Data in Federated Learning With Accumulative Learning","authors":"Nang Hung Nguyen;Truong Thao Nguyen;Trong Nghia Hoang;Hieu H. Pham;Thanh Hung Nguyen;Phi Le Nguyen","doi":"10.1109/TC.2025.3543682","DOIUrl":null,"url":null,"abstract":"Federated Learning (FL) has emerged as an effective paradigm allowing multiple parties to collaboratively train a global model while protecting their private data. However, it is observed that the performance of FL approaches tends to degrade significantly when data are sparsely distributed across clients with small datasets. This is referred to as the sparse-and-scarce challenge, where data held by each client is both sparse (does not contain examples to all classes) and scarce (small dataset). Sparse-and-scarce data diminishes the generalizability of clients’ data, leading to intensive over-fitting and massive domain shifts in the local models and, ultimately, decreasing the aggregated model's performance. Interestingly, while this scenario is a specific manifestation of the well-known non-IID<xref><sup>1</sup></xref><fn><label><sup>1</sup></label><p>This refers to the generic situation where local data distributions are not identical and independently distributed.</p></fn> challenge in FL, it has not been distinctly addressed. Our empirical investigation highlights that generic approaches to the non-IID challenge often prove inadequate in mitigating the sparse-and-scarce issue. To bridge this gap, we develop SAFA, a novel FL algorithm that specifically addresses the sparse-and-scarce challenge via a novel continual model iteration procedure. SAFA maximally exposes local models to the inter-client diversity of data with minimal effects of catastrophic forgetting. Our experiments show that SAFA outperforms existing FL solutions, up to 17.86%, compared to the prominent baseline. The code is accessible via <uri>https://github.com/HungNguyen20/SAFA</uri>.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"1844-1856"},"PeriodicalIF":3.6000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10908581/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Federated Learning (FL) has emerged as an effective paradigm allowing multiple parties to collaboratively train a global model while protecting their private data. However, it is observed that the performance of FL approaches tends to degrade significantly when data are sparsely distributed across clients with small datasets. This is referred to as the sparse-and-scarce challenge, where data held by each client is both sparse (does not contain examples to all classes) and scarce (small dataset). Sparse-and-scarce data diminishes the generalizability of clients’ data, leading to intensive over-fitting and massive domain shifts in the local models and, ultimately, decreasing the aggregated model's performance. Interestingly, while this scenario is a specific manifestation of the well-known non-IID1
This refers to the generic situation where local data distributions are not identical and independently distributed.
challenge in FL, it has not been distinctly addressed. Our empirical investigation highlights that generic approaches to the non-IID challenge often prove inadequate in mitigating the sparse-and-scarce issue. To bridge this gap, we develop SAFA, a novel FL algorithm that specifically addresses the sparse-and-scarce challenge via a novel continual model iteration procedure. SAFA maximally exposes local models to the inter-client diversity of data with minimal effects of catastrophic forgetting. Our experiments show that SAFA outperforms existing FL solutions, up to 17.86%, compared to the prominent baseline. The code is accessible via https://github.com/HungNguyen20/SAFA.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.