Model-Free Learning of Optimal Deterministic Resource Allocations in Wireless Systems via Action-Space Exploration

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) Pub Date : 2021-08-23 DOI:10.1109/mlsp52302.2021.9596327

Hassaan Hashmi, Dionysios S. Kalogerias

{"title":"Model-Free Learning of Optimal Deterministic Resource Allocations in Wireless Systems via Action-Space Exploration","authors":"Hassaan Hashmi, Dionysios S. Kalogerias","doi":"10.1109/mlsp52302.2021.9596327","DOIUrl":null,"url":null,"abstract":"Wireless systems resource allocation refers to perpetual and challenging nonconvex constrained optimization tasks, which are especially timely in modern communications and networking setups involving multiple users with heterogeneous objectives and imprecise or even unknown models and/or channel statistics. In this paper, we propose a technically grounded and scalable primal-dual deterministic policy gradient method for efficiently learning optimal parameterized resource allocation policies. Our method not only efficiently exploits gradient availability of popular universal policy representations, such as deep neural networks, but is also truly model-free, as it relies on consistent zeroth-order gradient approximations of the associated random network services constructed via low-dimensional perturbations in action space, thus fully bypassing any dependence on critics. Both theory and numerical simulations confirm the efficacy and applicability of the proposed approach, as well as its superiority over the current state of the art in terms of both achieving near-optimal performance and scalability.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"330 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlsp52302.2021.9596327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Wireless systems resource allocation refers to perpetual and challenging nonconvex constrained optimization tasks, which are especially timely in modern communications and networking setups involving multiple users with heterogeneous objectives and imprecise or even unknown models and/or channel statistics. In this paper, we propose a technically grounded and scalable primal-dual deterministic policy gradient method for efficiently learning optimal parameterized resource allocation policies. Our method not only efficiently exploits gradient availability of popular universal policy representations, such as deep neural networks, but is also truly model-free, as it relies on consistent zeroth-order gradient approximations of the associated random network services constructed via low-dimensional perturbations in action space, thus fully bypassing any dependence on critics. Both theory and numerical simulations confirm the efficacy and applicability of the proposed approach, as well as its superiority over the current state of the art in terms of both achieving near-optimal performance and scalability.

查看原文本刊更多论文

基于动作空间探索的无线系统最优确定性资源分配的无模型学习

无线系统资源分配是一项具有挑战性的非凸约束优化任务，尤其适用于现代通信和网络设置，涉及多个用户，具有异构目标和不精确甚至未知的模型和/或信道统计。本文提出了一种具有技术基础和可扩展性的原始对偶确定性策略梯度方法，用于有效学习最优参数化资源分配策略。我们的方法不仅有效地利用了流行的通用策略表示(如深度神经网络)的梯度可用性，而且是真正无模型的，因为它依赖于通过行动空间中的低维扰动构建的相关随机网络服务的一致零阶梯度近似，从而完全绕过了对批评的任何依赖。理论和数值模拟都证实了所提出方法的有效性和适用性，以及它在实现接近最佳性能和可扩展性方面优于当前艺术状态。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)

自引率

0.00%

发文量