高效联邦边缘学习的知识和模型驱动深度强化学习：单和多智能体框架

IEEE Transactions on Machine Learning in Communications and Networking Pub Date : 2025-01-27 DOI:10.1109/TMLCN.2025.3534754

Yangchen Li;Lingzhi Zhao;Tianle Wang;Lianghui Ding;Feng Yang

{"title":"高效联邦边缘学习的知识和模型驱动深度强化学习：单和多智能体框架","authors":"Yangchen Li;Lingzhi Zhao;Tianle Wang;Lianghui Ding;Feng Yang","doi":"10.1109/TMLCN.2025.3534754","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate federated learning (FL) efficiency improvement in practical edge computing systems, where edge workers have non-independent and identically distributed (non-IID) local data, as well as dynamic and heterogeneous computing and communication capabilities. We consider a general FL algorithm with configurable parameters, including the number of local iterations, mini-batch sizes, step sizes, aggregation weights, and quantization parameters, and provide a rigorous convergence analysis. We formulate a joint optimization problem for FL worker selection and algorithm parameter configuration to minimize the final test loss subject to time and energy constraints. The resulting problem is a complicated stochastic sequential decision-making problem with an implicit objective function and unknown transition probabilities. To address these challenges, we propose knowledge/model-driven single-agent and multi-agent deep reinforcement learning (DRL) frameworks. We transform the primal problem into a Markov decision process (MDP) for the single-agent DRL framework and a decentralized partially-observable Markov decision process (Dec-POMDP) for the multi-agent DRL framework. We develop efficient single-agent and multi-agent asynchronous advantage actor-critic (A3C) approaches to solve the MDP and Dec-POMDP, respectively. In both frameworks, we design a knowledge-based reward to facilitate effective DRL and propose a model-based stochastic policy to tackle the mixed discrete-continuous actions and large action spaces. To reduce the computational complexities of policy learning and execution, we introduce a segmented actor-critic architecture for the single-agent DRL and a distributed actor-critic architecture for the multi-agent DRL. Numerical results demonstrate the effectiveness and advantages of the proposed frameworks in enhancing FL efficiency.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"3 ","pages":"332-352"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854500","citationCount":"0","resultStr":"{\"title\":\"Knowledge- and Model-Driven Deep Reinforcement Learning for Efficient Federated Edge Learning: Single- and Multi-Agent Frameworks\",\"authors\":\"Yangchen Li;Lingzhi Zhao;Tianle Wang;Lianghui Ding;Feng Yang\",\"doi\":\"10.1109/TMLCN.2025.3534754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate federated learning (FL) efficiency improvement in practical edge computing systems, where edge workers have non-independent and identically distributed (non-IID) local data, as well as dynamic and heterogeneous computing and communication capabilities. We consider a general FL algorithm with configurable parameters, including the number of local iterations, mini-batch sizes, step sizes, aggregation weights, and quantization parameters, and provide a rigorous convergence analysis. We formulate a joint optimization problem for FL worker selection and algorithm parameter configuration to minimize the final test loss subject to time and energy constraints. The resulting problem is a complicated stochastic sequential decision-making problem with an implicit objective function and unknown transition probabilities. To address these challenges, we propose knowledge/model-driven single-agent and multi-agent deep reinforcement learning (DRL) frameworks. We transform the primal problem into a Markov decision process (MDP) for the single-agent DRL framework and a decentralized partially-observable Markov decision process (Dec-POMDP) for the multi-agent DRL framework. We develop efficient single-agent and multi-agent asynchronous advantage actor-critic (A3C) approaches to solve the MDP and Dec-POMDP, respectively. In both frameworks, we design a knowledge-based reward to facilitate effective DRL and propose a model-based stochastic policy to tackle the mixed discrete-continuous actions and large action spaces. To reduce the computational complexities of policy learning and execution, we introduce a segmented actor-critic architecture for the single-agent DRL and a distributed actor-critic architecture for the multi-agent DRL. Numerical results demonstrate the effectiveness and advantages of the proposed frameworks in enhancing FL efficiency.\",\"PeriodicalId\":100641,\"journal\":{\"name\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"volume\":\"3 \",\"pages\":\"332-352\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854500\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10854500/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10854500/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们研究了实际边缘计算系统中联邦学习（FL）效率的提高，其中边缘工作者具有非独立和同分布（非iid）本地数据，以及动态和异构计算和通信能力。我们考虑了一种具有可配置参数的通用FL算法，包括局部迭代次数、小批量大小、步长、聚合权值和量化参数，并提供了严格的收敛分析。在时间和能量约束下，以最小化最终测试损失为目标，提出了FL工人选择和算法参数配置的联合优化问题。该问题是一个具有隐式目标函数和未知转移概率的复杂随机序列决策问题。为了应对这些挑战，我们提出了知识/模型驱动的单智能体和多智能体深度强化学习（DRL）框架。我们将原始问题转化为单智能体DRL框架的马尔可夫决策过程（MDP）和多智能体DRL框架的分散部分可观察马尔可夫决策过程（Dec-POMDP）。我们开发了高效的单智能体和多智能体异步优势参与者-评论家（A3C）方法来分别解决MDP和Dec-POMDP问题。在这两个框架中，我们设计了一种基于知识的奖励来促进有效的DRL，并提出了一种基于模型的随机策略来处理混合离散-连续动作和大动作空间。为了降低策略学习和执行的计算复杂性，我们为单智能体DRL引入了分段的参与者-批评体系结构，为多智能体DRL引入了分布式的参与者-批评体系结构。数值结果表明了所提框架在提高FL效率方面的有效性和优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Knowledge- and Model-Driven Deep Reinforcement Learning for Efficient Federated Edge Learning: Single- and Multi-Agent Frameworks

In this paper, we investigate federated learning (FL) efficiency improvement in practical edge computing systems, where edge workers have non-independent and identically distributed (non-IID) local data, as well as dynamic and heterogeneous computing and communication capabilities. We consider a general FL algorithm with configurable parameters, including the number of local iterations, mini-batch sizes, step sizes, aggregation weights, and quantization parameters, and provide a rigorous convergence analysis. We formulate a joint optimization problem for FL worker selection and algorithm parameter configuration to minimize the final test loss subject to time and energy constraints. The resulting problem is a complicated stochastic sequential decision-making problem with an implicit objective function and unknown transition probabilities. To address these challenges, we propose knowledge/model-driven single-agent and multi-agent deep reinforcement learning (DRL) frameworks. We transform the primal problem into a Markov decision process (MDP) for the single-agent DRL framework and a decentralized partially-observable Markov decision process (Dec-POMDP) for the multi-agent DRL framework. We develop efficient single-agent and multi-agent asynchronous advantage actor-critic (A3C) approaches to solve the MDP and Dec-POMDP, respectively. In both frameworks, we design a knowledge-based reward to facilitate effective DRL and propose a model-based stochastic policy to tackle the mixed discrete-continuous actions and large action spaces. To reduce the computational complexities of policy learning and execution, we introduce a segmented actor-critic architecture for the single-agent DRL and a distributed actor-critic architecture for the multi-agent DRL. Numerical results demonstrate the effectiveness and advantages of the proposed frameworks in enhancing FL efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Machine Learning in Communications and Networking

自引率

0.00%

发文量