Partially Observable Multi-Agent Deep Reinforcement Learning for Cognitive Resource Management

GLOBECOM 2020 - 2020 IEEE Global Communications Conference Pub Date : 2020-12-01 DOI:10.1109/GLOBECOM42002.2020.9322150

Ning Yang, Haijun Zhang, R. Berry

引用次数: 9

Abstract

In this paper, the problem of dynamic resource management in a cognitive radio network (CRN) with multiple primary users (PUs), multiple secondary users (SUs), and multiple channels is investigated. An optimization problem is formulated as a multi-agent partially observable Markov decision process (POMDP) problem in a dynamic and not fully observable environment. We consider using deep reinforcement learning (DRL) to address this problem. Based on the channel occupancy of PUs, a multi-agent deep Q-network (DQN)-based dynamic joint spectrum access and mode selection (SAMS) scheme is proposed for the SUs in the partially observable environment. The current observation of each SU is mapped to a suitable action. Each secondary user (SU) takes its own decision without exchanging information with other SUs. It seeks to maximize the total sum rate. Simulation results verify the effectiveness of our proposed schemes.

查看原文本刊更多论文

面向认知资源管理的部分可观察多智能体深度强化学习

研究了具有多个主用户、多个从用户和多个信道的认知无线网络(CRN)中的动态资源管理问题。将优化问题表述为动态非完全可观察环境下的多智能体部分可观察马尔可夫决策过程问题。我们考虑使用深度强化学习(DRL)来解决这个问题。在部分可观测环境下，基于pu的信道占用，提出了一种基于多智能体深度q网络(DQN)的su动态联合频谱接入和模式选择(SAMS)方案。每个SU的当前观测被映射到一个合适的动作。每个辅助用户(SU)做出自己的决定，而不与其他SU交换信息。它寻求最大限度地提高总利率。仿真结果验证了所提方案的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

GLOBECOM 2020 - 2020 IEEE Global Communications Conference

自引率

0.00%

发文量