基于约束深度强化学习的下行多用户MIMO延迟感知功率控制

2021 IEEE Global Communications Conference (GLOBECOM) Pub Date : 2021-12-01 DOI:10.1109/GLOBECOM46510.2021.9685617

Chang Tian, G. Huang, An Liu, Wu Luo

{"title":"基于约束深度强化学习的下行多用户MIMO延迟感知功率控制","authors":"Chang Tian, G. Huang, An Liu, Wu Luo","doi":"10.1109/GLOBECOM46510.2021.9685617","DOIUrl":null,"url":null,"abstract":"We investigate the downlink transmission for multi-user multi-input multi-out (MU-MIMO) system, in which the regularized zero forcing (RZF) precoder is adopted and the power allocation and regularization factor are optimized. Our aim is to find a power allocation and regularization factor control policy that can minimize the long-term average power consumption subject to long-term delay constraint for each user. The induced optimization problem is formulated as a constrained Markov decision process (CMDP), which is efficiently solved by the proposed constrained deep reinforcement learning algorithm, called successive convex approximation policy optimization (SCAPO). The SCAPO is based on solving a sequence of convex objective/feasibility optimization problems obtained by replacing the objective and constraint functions in the original problems with convex surrogate functions. At each iteration, the SCAPO merely needs to estimate the first-order information and solve a convex surrogate problem that can be efficiently parallel tackled. Moreover, the SCAPO enables to reuse old experiences from previous updates, thereby significantly reducing the implementation cost. Numerical results have shown that the novel SCAPO can achieve the state-of-the-art performance over advanced baselines.","PeriodicalId":200641,"journal":{"name":"2021 IEEE Global Communications Conference (GLOBECOM)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Delay-Aware Power Control for Downlink Multi-User MIMO via Constrained Deep Reinforcement Learning\",\"authors\":\"Chang Tian, G. Huang, An Liu, Wu Luo\",\"doi\":\"10.1109/GLOBECOM46510.2021.9685617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We investigate the downlink transmission for multi-user multi-input multi-out (MU-MIMO) system, in which the regularized zero forcing (RZF) precoder is adopted and the power allocation and regularization factor are optimized. Our aim is to find a power allocation and regularization factor control policy that can minimize the long-term average power consumption subject to long-term delay constraint for each user. The induced optimization problem is formulated as a constrained Markov decision process (CMDP), which is efficiently solved by the proposed constrained deep reinforcement learning algorithm, called successive convex approximation policy optimization (SCAPO). The SCAPO is based on solving a sequence of convex objective/feasibility optimization problems obtained by replacing the objective and constraint functions in the original problems with convex surrogate functions. At each iteration, the SCAPO merely needs to estimate the first-order information and solve a convex surrogate problem that can be efficiently parallel tackled. Moreover, the SCAPO enables to reuse old experiences from previous updates, thereby significantly reducing the implementation cost. Numerical results have shown that the novel SCAPO can achieve the state-of-the-art performance over advanced baselines.\",\"PeriodicalId\":200641,\"journal\":{\"name\":\"2021 IEEE Global Communications Conference (GLOBECOM)\",\"volume\":\"125 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Global Communications Conference (GLOBECOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GLOBECOM46510.2021.9685617\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Global Communications Conference (GLOBECOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GLOBECOM46510.2021.9685617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

研究了多用户多输入多输出(MU-MIMO)系统的下行传输，该系统采用正则化强制归零(RZF)预编码器，并对功率分配和正则化因子进行了优化。我们的目标是找到一种功率分配和正则化因子控制策略，使每个用户在长期延迟约束下的长期平均功耗最小。将诱导优化问题表述为一个约束马尔可夫决策过程(CMDP)，并通过提出的约束深度强化学习算法——连续凸逼近策略优化算法(SCAPO)进行有效求解。SCAPO基于求解一系列凸目标/可行性优化问题，将原问题中的目标和约束函数替换为凸替代函数。在每次迭代中，SCAPO只需要估计一阶信息并求解一个可以有效并行处理的凸代理问题。此外，SCAPO支持重用以前更新中的旧经验，从而大大降低了实现成本。数值结果表明，新型SCAPO可以在先进的基线上达到最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Delay-Aware Power Control for Downlink Multi-User MIMO via Constrained Deep Reinforcement Learning

We investigate the downlink transmission for multi-user multi-input multi-out (MU-MIMO) system, in which the regularized zero forcing (RZF) precoder is adopted and the power allocation and regularization factor are optimized. Our aim is to find a power allocation and regularization factor control policy that can minimize the long-term average power consumption subject to long-term delay constraint for each user. The induced optimization problem is formulated as a constrained Markov decision process (CMDP), which is efficiently solved by the proposed constrained deep reinforcement learning algorithm, called successive convex approximation policy optimization (SCAPO). The SCAPO is based on solving a sequence of convex objective/feasibility optimization problems obtained by replacing the objective and constraint functions in the original problems with convex surrogate functions. At each iteration, the SCAPO merely needs to estimate the first-order information and solve a convex surrogate problem that can be efficiently parallel tackled. Moreover, the SCAPO enables to reuse old experiences from previous updates, thereby significantly reducing the implementation cost. Numerical results have shown that the novel SCAPO can achieve the state-of-the-art performance over advanced baselines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Global Communications Conference (GLOBECOM)

自引率

0.00%

发文量