Qinglun Li;Li Shen;Guanghao Li;Quanjun Yin;Dacheng Tao
{"title":"DFedADMM: Dual Constraint Controlled Model Inconsistency for Decentralize Federated Learning","authors":"Qinglun Li;Li Shen;Guanghao Li;Quanjun Yin;Dacheng Tao","doi":"10.1109/TPAMI.2025.3546659","DOIUrl":null,"url":null,"abstract":"To address the communication burden issues associated with Federated Learning (FL), Decentralized Federated Learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which existing DFL methods have not fundamentally addressed. To tackle these issues, we propose novel DFL algorithms, DFedADMM and its enhanced version DFedADMM-SAM, to improve the performance for DFL. The DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual variables to control the model inconsistency raised from the decentralized heterogeneous data distributions. The DFedADMM-SAM algorithm further improves on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which uses gradient perturbations to generate locally flat models and searches for models with uniformly low loss values to mitigate local heterogeneous overfitting. Theoretically, we derive convergence rates of <inline-formula><tex-math>$\\mathcal {O}(\\frac{1}{\\sqrt{KT}}+\\frac{1}{KT(1-\\psi )^{2}})$</tex-math></inline-formula> and <inline-formula><tex-math>$ \\mathcal {O}(\\frac{1}{\\sqrt{KT}}+\\frac{1}{KT(1-\\psi )^{2}}+ \\frac{1}{T^{3/2}K^{1/2}})$</tex-math></inline-formula> in the non-convex setting for DFedADMM and DFedADMM-SAM, respectively, where <inline-formula><tex-math>$1 - \\psi$</tex-math></inline-formula> represents the spectral gap of the gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets demonstrate that our algorithms exhibit superior performance in terms of generalization, convergence speed, and communication overhead compared to existing state-of-the-art (SOTA) optimizers in DFL.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4803-4815"},"PeriodicalIF":18.6000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10908045/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
To address the communication burden issues associated with Federated Learning (FL), Decentralized Federated Learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which existing DFL methods have not fundamentally addressed. To tackle these issues, we propose novel DFL algorithms, DFedADMM and its enhanced version DFedADMM-SAM, to improve the performance for DFL. The DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual variables to control the model inconsistency raised from the decentralized heterogeneous data distributions. The DFedADMM-SAM algorithm further improves on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which uses gradient perturbations to generate locally flat models and searches for models with uniformly low loss values to mitigate local heterogeneous overfitting. Theoretically, we derive convergence rates of $\mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}})$ and $ \mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}}+ \frac{1}{T^{3/2}K^{1/2}})$ in the non-convex setting for DFedADMM and DFedADMM-SAM, respectively, where $1 - \psi$ represents the spectral gap of the gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets demonstrate that our algorithms exhibit superior performance in terms of generalization, convergence speed, and communication overhead compared to existing state-of-the-art (SOTA) optimizers in DFL.