Dataset Distillation-Based Hybrid Federated Learning on Non-IID Data

IF 7.9 2区计算机科学 Q1 ENGINEERING, MULTIDISCIPLINARY

IEEE Transactions on Network Science and Engineering Pub Date : 2026-01-01 Epub Date: 2026-03-31 DOI:10.1109/TNSE.2026.3679013

Xiufang Shi;Wei Zhang;Yuheng Li;Mincheng Wu;Zhenyu Wen;Shibo He;Tejal Shah;Rajiv Ranjan

{"title":"Dataset Distillation-Based Hybrid Federated Learning on Non-IID Data","authors":"Xiufang Shi;Wei Zhang;Yuheng Li;Mincheng Wu;Zhenyu Wen;Shibo He;Tejal Shah;Rajiv Ranjan","doi":"10.1109/TNSE.2026.3679013","DOIUrl":null,"url":null,"abstract":"In federated learning, the heterogeneity of client data has a great impact on the performance of model training. Many heterogeneity issues in this process are raised by non-independently and identically distributed (non-IID) data. To address the issue of label distribution skew, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training. In particular, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced. The cluster heads collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server. This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of non-IID data on model training. We perform a comprehensive analysis of the convergence behavior, communication overhead, and computational complexity of the proposed HFLDD. Extensive experimental results based on multiple public datasets demonstrate that when data labels are severely imbalanced, the proposed HFLDD outperforms the baseline methods in terms of both test accuracy and communication cost.","PeriodicalId":54229,"journal":{"name":"IEEE Transactions on Network Science and Engineering","volume":"13 ","pages":"8331-8347"},"PeriodicalIF":7.9000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11458621/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

In federated learning, the heterogeneity of client data has a great impact on the performance of model training. Many heterogeneity issues in this process are raised by non-independently and identically distributed (non-IID) data. To address the issue of label distribution skew, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training. In particular, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced. The cluster heads collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server. This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of non-IID data on model training. We perform a comprehensive analysis of the convergence behavior, communication overhead, and computational complexity of the proposed HFLDD. Extensive experimental results based on multiple public datasets demonstrate that when data labels are severely imbalanced, the proposed HFLDD outperforms the baseline methods in terms of both test accuracy and communication cost.

查看原文本刊更多论文

基于数据集蒸馏的非iid数据混合联邦学习

在联邦学习中，客户端数据的异构性对模型训练的性能有很大影响。在此过程中，非独立和同分布（non-IID）数据引起了许多异构性问题。为了解决标签分布偏斜的问题，我们提出了一种称为HFLDD的混合联邦学习框架，该框架集成了数据蒸馏来生成近似独立且均匀分布（IID）的数据，从而提高了模型训练的性能。特别是，我们将客户端划分为异构集群，其中集群内不同客户端之间的数据标签是不平衡的，而不同集群之间的数据标签是平衡的。集群头从相应的集群成员中收集数据，并与服务器协作进行模型训练。这种训练过程类似于传统的IID数据的联邦学习，有效地缓解了非IID数据对模型训练的影响。我们对HFLDD的收敛行为、通信开销和计算复杂度进行了全面的分析。基于多个公共数据集的大量实验结果表明，当数据标签严重不平衡时，所提出的HFLDD在测试精度和通信成本方面都优于基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Network Science and Engineering Engineering-Control and Systems Engineering

CiteScore

12.60

自引率

9.10%

发文量

393

期刊介绍： The proposed journal, called the IEEE Transactions on Network Science and Engineering (TNSE), is committed to timely publishing of peer-reviewed technical articles that deal with the theory and applications of network science and the interconnections among the elements in a system that form a network. In particular, the IEEE Transactions on Network Science and Engineering publishes articles on understanding, prediction, and control of structures and behaviors of networks at the fundamental level. The types of networks covered include physical or engineered networks, information networks, biological networks, semantic networks, economic networks, social networks, and ecological networks. Aimed at discovering common principles that govern network structures, network functionalities and behaviors of networks, the journal seeks articles on understanding, prediction, and control of structures and behaviors of networks. Another trans-disciplinary focus of the IEEE Transactions on Network Science and Engineering is the interactions between and co-evolution of different genres of networks.