DFedADMM: Dual Constraint Controlled Model Inconsistency for Decentralize Federated Learning

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-28 DOI:10.1109/TPAMI.2025.3546659

Qinglun Li;Li Shen;Guanghao Li;Quanjun Yin;Dacheng Tao

{"title":"DFedADMM: Dual Constraint Controlled Model Inconsistency for Decentralize Federated Learning","authors":"Qinglun Li;Li Shen;Guanghao Li;Quanjun Yin;Dacheng Tao","doi":"10.1109/TPAMI.2025.3546659","DOIUrl":null,"url":null,"abstract":"To address the communication burden issues associated with Federated Learning (FL), Decentralized Federated Learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which existing DFL methods have not fundamentally addressed. To tackle these issues, we propose novel DFL algorithms, DFedADMM and its enhanced version DFedADMM-SAM, to improve the performance for DFL. The DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual variables to control the model inconsistency raised from the decentralized heterogeneous data distributions. The DFedADMM-SAM algorithm further improves on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which uses gradient perturbations to generate locally flat models and searches for models with uniformly low loss values to mitigate local heterogeneous overfitting. Theoretically, we derive convergence rates of <inline-formula><tex-math>$\\mathcal {O}(\\frac{1}{\\sqrt{KT}}+\\frac{1}{KT(1-\\psi )^{2}})$</tex-math></inline-formula> and <inline-formula><tex-math>$ \\mathcal {O}(\\frac{1}{\\sqrt{KT}}+\\frac{1}{KT(1-\\psi )^{2}}+ \\frac{1}{T^{3/2}K^{1/2}})$</tex-math></inline-formula> in the non-convex setting for DFedADMM and DFedADMM-SAM, respectively, where <inline-formula><tex-math>$1 - \\psi$</tex-math></inline-formula> represents the spectral gap of the gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets demonstrate that our algorithms exhibit superior performance in terms of generalization, convergence speed, and communication overhead compared to existing state-of-the-art (SOTA) optimizers in DFL.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"4803-4815"},"PeriodicalIF":18.6000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10908045/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

To address the communication burden issues associated with Federated Learning (FL), Decentralized Federated Learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which existing DFL methods have not fundamentally addressed. To tackle these issues, we propose novel DFL algorithms, DFedADMM and its enhanced version DFedADMM-SAM, to improve the performance for DFL. The DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual variables to control the model inconsistency raised from the decentralized heterogeneous data distributions. The DFedADMM-SAM algorithm further improves on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which uses gradient perturbations to generate locally flat models and searches for models with uniformly low loss values to mitigate local heterogeneous overfitting. Theoretically, we derive convergence rates of

$\mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}})$

and

$ \mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}}+ \frac{1}{T^{3/2}K^{1/2}})$

in the non-convex setting for DFedADMM and DFedADMM-SAM, respectively, where

$1 - \psi$

represents the spectral gap of the gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10, and CIFAR100 datasets demonstrate that our algorithms exhibit superior performance in terms of generalization, convergence speed, and communication overhead compared to existing state-of-the-art (SOTA) optimizers in DFL.

查看原文本刊更多论文

分布式联邦学习的双约束控制模型不一致性

为了解决与联邦学习（FL）相关的通信负担问题，分散式联邦学习（DFL）抛弃了中央服务器，并建立了一个分散的通信网络，其中每个客户端仅与相邻客户端通信。然而，现有的DFL方法仍然面临着两大挑战：局部不一致和局部异质过拟合，这是现有DFL方法没有从根本上解决的问题。为了解决这些问题，我们提出了新的DFL算法DFedADMM及其增强版本DFedADMM- sam，以提高DFL的性能。DFedADMM算法采用原始对偶优化（ADMM），利用对偶变量控制分散异构数据分布引起的模型不一致性。DFedADMM-SAM算法在DFedADMM的基础上进一步改进，采用锐度感知最小化（sharpessaware Minimization， SAM）优化器，利用梯度扰动生成局部平坦模型，并搜索具有均匀低损失值的模型，以减轻局部异质性过拟合。理论上，我们分别推导出DFedADMM和DFedADMM- sam在非凸设置下的收敛速率$\mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}})$和$ \mathcal {O}(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-\psi )^{2}}+ \frac{1}{T^{3/2}K^{1/2}})$，其中$1 - \psi$表示八卦矩阵的谱间隙。经验上，在MNIST、CIFAR10和CIFAR100数据集上进行的大量实验表明，与DFL中现有的最先进（SOTA）优化器相比，我们的算法在泛化、收敛速度和通信开销方面表现出优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量