Straggler-Resilient Asynchronous ADMM for Distributed Consensus Optimization

IF 4.6 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Signal Processing Pub Date : 2025-06-16 DOI:10.1109/TSP.2025.3579628

Jeannie He;Ming Xiao;Mikael Skoglund;Harold Vincent Poor

{"title":"Straggler-Resilient Asynchronous ADMM for Distributed Consensus Optimization","authors":"Jeannie He;Ming Xiao;Mikael Skoglund;Harold Vincent Poor","doi":"10.1109/TSP.2025.3579628","DOIUrl":null,"url":null,"abstract":"For its simplicity, well-established convergence properties, and applicability to various optimization problems, the alternating direction method of multipliers (ADMM) has been widely used in several fields. However, when applied in distributed systems, the method may encounter the challenges of stragglers (nodes with significantly longer response time than others) and single points of failure (a single node causing the failure of the entire system). To address these problems, we propose three straggler-resilient ADMM algorithms. The first one is a centralized straggler-resilient ADMM algorithm achieving straggler-resilience by allowing the nodes to proceed to the next iteration even when one or more nodes have not provided an update for one or more iterations. The second one is an extension of the first one achieving single-point-of-failure resilience and fast convergence through decentralized, asynchronous, and concurrent operations. The third one is an extension of the second one to also achieve robustness against uncertainties with the help of a time-tracking scheme. Through theoretical analyses, we establish the convergence properties of our algorithms and show that our algorithms achieve a computational complexity of <inline-formula><tex-math>$\\mathcal{O}(1)$</tex-math></inline-formula> for each worker node - excluding the central node in the centralized algorithm, where the workload complexity is <inline-formula><tex-math>$\\mathcal{O}(N)$</tex-math></inline-formula>. By numerical simulations with various settings, we show that our algorithms have converged significantly faster than several state-of-the-art ADMM algorithms with well-established convergence properties.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"2496-2510"},"PeriodicalIF":4.6000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11037351/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

For its simplicity, well-established convergence properties, and applicability to various optimization problems, the alternating direction method of multipliers (ADMM) has been widely used in several fields. However, when applied in distributed systems, the method may encounter the challenges of stragglers (nodes with significantly longer response time than others) and single points of failure (a single node causing the failure of the entire system). To address these problems, we propose three straggler-resilient ADMM algorithms. The first one is a centralized straggler-resilient ADMM algorithm achieving straggler-resilience by allowing the nodes to proceed to the next iteration even when one or more nodes have not provided an update for one or more iterations. The second one is an extension of the first one achieving single-point-of-failure resilience and fast convergence through decentralized, asynchronous, and concurrent operations. The third one is an extension of the second one to also achieve robustness against uncertainties with the help of a time-tracking scheme. Through theoretical analyses, we establish the convergence properties of our algorithms and show that our algorithms achieve a computational complexity of

$\mathcal{O}(1)$

for each worker node - excluding the central node in the centralized algorithm, where the workload complexity is

$\mathcal{O}(N)$

. By numerical simulations with various settings, we show that our algorithms have converged significantly faster than several state-of-the-art ADMM algorithms with well-established convergence properties.

查看原文本刊更多论文

分布式一致性优化的离散-弹性异步ADMM

乘法器交替方向法（ADMM）以其简单、收敛性好、适用于各种优化问题等优点，在多个领域得到了广泛的应用。然而，当应用于分布式系统时，该方法可能会遇到掉队（响应时间比其他节点长得多的节点）和单点故障（单个节点导致整个系统故障）的挑战。为了解决这些问题，我们提出了三种离散弹性ADMM算法。第一个是集中式离散弹性ADMM算法，通过允许节点继续进行下一个迭代，即使一个或多个节点没有为一个或多个迭代提供更新，也可以实现离散弹性。第二个是第一个的扩展，通过分散、异步和并发操作实现单点故障弹性和快速收敛。第三种方法是第二种方法的扩展，在时间跟踪方案的帮助下实现对不确定性的鲁棒性。通过理论分析，我们建立了算法的收敛性，并证明了我们的算法对每个工作节点的计算复杂度为$\mathcal{O}(1)$ -不包括集中式算法中的中心节点，其中工作负载复杂度为$\mathcal{O}(N)$。通过各种设置的数值模拟，我们表明我们的算法收敛速度明显快于几种具有良好收敛特性的最先进的ADMM算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Signal Processing 工程技术-工程：电子与电气

CiteScore

11.20

自引率

9.30%

发文量

310

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.