车辆网络快速频谱共享:一种元强化学习方法

2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall) Pub Date : 2022-09-01 DOI:10.1109/VTC2022-Fall57202.2022.10012705

Kai Huang, Zezhou Luo, Le Liang, Shi Jin

{"title":"车辆网络快速频谱共享:一种元强化学习方法","authors":"Kai Huang, Zezhou Luo, Le Liang, Shi Jin","doi":"10.1109/VTC2022-Fall57202.2022.10012705","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the resource allocation problem in a dynamic vehicular environment, where multiple vehicle-to-vehicle links attempt to reuse the spectrum of vehicle-to-infrastructure links. It is modeled as a deep reinforcement learning problem that is subject to proximal policy optimization. Training a well-performing policy usually requires a massive amount of interactions with the environment for a long time and thus is typically performed on a simulator. However, an agent well trained in a simulated environment may still fail when deployed in a live network, due to inevitable difference between the two environments, termed reality gap. We make preliminary efforts to address this issue by leveraging meta reinforcement learning that allows the learning agent to quickly adapt to a new environment with minimal interactions after being trained across a variety of similar tasks. We demonstrate that only a few episodes are required for the meta trained policy to adapt to a new environment and the proposed method is shown to achieve near-optimal performance and exhibit rapid convergence.","PeriodicalId":326047,"journal":{"name":"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fast Spectrum Sharing in Vehicular Networks: A Meta Reinforcement Learning Approach\",\"authors\":\"Kai Huang, Zezhou Luo, Le Liang, Shi Jin\",\"doi\":\"10.1109/VTC2022-Fall57202.2022.10012705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate the resource allocation problem in a dynamic vehicular environment, where multiple vehicle-to-vehicle links attempt to reuse the spectrum of vehicle-to-infrastructure links. It is modeled as a deep reinforcement learning problem that is subject to proximal policy optimization. Training a well-performing policy usually requires a massive amount of interactions with the environment for a long time and thus is typically performed on a simulator. However, an agent well trained in a simulated environment may still fail when deployed in a live network, due to inevitable difference between the two environments, termed reality gap. We make preliminary efforts to address this issue by leveraging meta reinforcement learning that allows the learning agent to quickly adapt to a new environment with minimal interactions after being trained across a variety of similar tasks. We demonstrate that only a few episodes are required for the meta trained policy to adapt to a new environment and the proposed method is shown to achieve near-optimal performance and exhibit rapid convergence.\",\"PeriodicalId\":326047,\"journal\":{\"name\":\"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VTC2022-Fall57202.2022.10012705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VTC2022-Fall57202.2022.10012705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们研究了动态车辆环境中的资源分配问题，其中多个车对车链路试图重用车对基础设施链路的频谱。它被建模为一个深度强化学习问题，服从于近端策略优化。训练一个执行良好的策略通常需要与环境进行长时间的大量交互，因此通常在模拟器上执行。然而，在模拟环境中训练良好的代理在实际网络中部署时仍然可能失败，这是由于两种环境之间不可避免的差异，称为现实差距。我们通过利用元强化学习做出了初步的努力来解决这个问题，元强化学习允许学习代理在经过各种类似任务的训练后以最小的交互快速适应新环境。我们证明，元训练策略只需要几集就可以适应新的环境，并且所提出的方法可以达到接近最优的性能并表现出快速收敛。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast Spectrum Sharing in Vehicular Networks: A Meta Reinforcement Learning Approach

In this paper, we investigate the resource allocation problem in a dynamic vehicular environment, where multiple vehicle-to-vehicle links attempt to reuse the spectrum of vehicle-to-infrastructure links. It is modeled as a deep reinforcement learning problem that is subject to proximal policy optimization. Training a well-performing policy usually requires a massive amount of interactions with the environment for a long time and thus is typically performed on a simulator. However, an agent well trained in a simulated environment may still fail when deployed in a live network, due to inevitable difference between the two environments, termed reality gap. We make preliminary efforts to address this issue by leveraging meta reinforcement learning that allows the learning agent to quickly adapt to a new environment with minimal interactions after being trained across a variety of similar tasks. We demonstrate that only a few episodes are required for the meta trained policy to adapt to a new environment and the proposed method is shown to achieve near-optimal performance and exhibit rapid convergence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)

自引率

0.00%

发文量