DFedADMM: Dual Constraint Controlled Model Inconsistency for Decentralize Federated Learning

IF 18.6
Qinglun Li;Li Shen;Guanghao Li;Quanjun Yin;Dacheng Tao
{"title":"DFedADMM: Dual Constraint Controlled Model Inconsistency for Decentralize Federated Learning","authors":"Qinglun Li;Li Shen;Guanghao Li;Quanjun Yin;Dacheng Tao","doi":"10.1109/TPAMI.2025.3546659","DOIUrl":null,"url":null,"abstract":"To address the communication burden issues associated with Federated Learning (FL), Decentralized Federated Learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which existing DFL methods have not fundamentally addressed. To tackle these issues, we propose novel DFL algorithms, DFedADMM and its enhanced version DFedADMM-SAM, to improve the performance for DFL. The DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual variables to control the model inconsistency raised from the decentralized heterogeneous data distributions. The DFedADMM-SAM algorithm further improves on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which uses gradient perturbations to generate locally flat models and searches for models with uniformly low loss values to mitigate local heterogeneous overfitting. Theoretically, we derive convergence rates of <inline-formula><tex-math>$\\mathcal {O}(\\frac{1}{\\sqrt{KT}}+\\frac{1}{KT(1-\\psi )^{2}})$</tex-math></inline-formula> and <inline-formula><tex-math>$ \\mathcal {O}(\\frac{1}{\\sqrt{KT}}+\\frac{1}{KT(1-\\psi )^{2}}+ \\frac{1}{T^{3/2}K^{1/2}})$</tex-math></inline-formula> in the non-convex setting for DFedADMM and DFedADMM-SAM, respectively, where <inline-formula><tex-math>$1 - \\psi$</tex-math></inline-formula> represents the spectral gap of the gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets demonstrate that our algorithms exhibit superior performance in terms of generalization, convergence speed, and communication overhead compared to existing state-of-the-art (SOTA) optimizers in DFL.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4803-4815"},"PeriodicalIF":18.6000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10908045/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

To address the communication burden issues associated with Federated Learning (FL), Decentralized Federated Learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which existing DFL methods have not fundamentally addressed. To tackle these issues, we propose novel DFL algorithms, DFedADMM and its enhanced version DFedADMM-SAM, to improve the performance for DFL. The DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual variables to control the model inconsistency raised from the decentralized heterogeneous data distributions. The DFedADMM-SAM algorithm further improves on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which uses gradient perturbations to generate locally flat models and searches for models with uniformly low loss values to mitigate local heterogeneous overfitting. Theoretically, we derive convergence rates of $\mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}})$ and $ \mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}}+ \frac{1}{T^{3/2}K^{1/2}})$ in the non-convex setting for DFedADMM and DFedADMM-SAM, respectively, where $1 - \psi$ represents the spectral gap of the gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets demonstrate that our algorithms exhibit superior performance in terms of generalization, convergence speed, and communication overhead compared to existing state-of-the-art (SOTA) optimizers in DFL.
分布式联邦学习的双约束控制模型不一致性
为了解决与联邦学习(FL)相关的通信负担问题,分散式联邦学习(DFL)抛弃了中央服务器,并建立了一个分散的通信网络,其中每个客户端仅与相邻客户端通信。然而,现有的DFL方法仍然面临着两大挑战:局部不一致和局部异质过拟合,这是现有DFL方法没有从根本上解决的问题。为了解决这些问题,我们提出了新的DFL算法DFedADMM及其增强版本DFedADMM- sam,以提高DFL的性能。DFedADMM算法采用原始对偶优化(ADMM),利用对偶变量控制分散异构数据分布引起的模型不一致性。DFedADMM-SAM算法在DFedADMM的基础上进一步改进,采用锐度感知最小化(sharpessaware Minimization, SAM)优化器,利用梯度扰动生成局部平坦模型,并搜索具有均匀低损失值的模型,以减轻局部异质性过拟合。理论上,我们分别推导出DFedADMM和DFedADMM- sam在非凸设置下的收敛速率$\mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}})$和$ \mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}}+ \frac{1}{T^{3/2}K^{1/2}})$,其中$1 - \psi$表示八卦矩阵的谱间隙。经验上,在MNIST、CIFAR10和CIFAR100数据集上进行的大量实验表明,与DFL中现有的最先进(SOTA)优化器相比,我们的算法在泛化、收敛速度和通信开销方面表现出优越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信