{"title":"Fast Spectrum Sharing in Vehicular Networks: A Meta Reinforcement Learning Approach","authors":"Kai Huang, Zezhou Luo, Le Liang, Shi Jin","doi":"10.1109/VTC2022-Fall57202.2022.10012705","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the resource allocation problem in a dynamic vehicular environment, where multiple vehicle-to-vehicle links attempt to reuse the spectrum of vehicle-to-infrastructure links. It is modeled as a deep reinforcement learning problem that is subject to proximal policy optimization. Training a well-performing policy usually requires a massive amount of interactions with the environment for a long time and thus is typically performed on a simulator. However, an agent well trained in a simulated environment may still fail when deployed in a live network, due to inevitable difference between the two environments, termed reality gap. We make preliminary efforts to address this issue by leveraging meta reinforcement learning that allows the learning agent to quickly adapt to a new environment with minimal interactions after being trained across a variety of similar tasks. We demonstrate that only a few episodes are required for the meta trained policy to adapt to a new environment and the proposed method is shown to achieve near-optimal performance and exhibit rapid convergence.","PeriodicalId":326047,"journal":{"name":"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VTC2022-Fall57202.2022.10012705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we investigate the resource allocation problem in a dynamic vehicular environment, where multiple vehicle-to-vehicle links attempt to reuse the spectrum of vehicle-to-infrastructure links. It is modeled as a deep reinforcement learning problem that is subject to proximal policy optimization. Training a well-performing policy usually requires a massive amount of interactions with the environment for a long time and thus is typically performed on a simulator. However, an agent well trained in a simulated environment may still fail when deployed in a live network, due to inevitable difference between the two environments, termed reality gap. We make preliminary efforts to address this issue by leveraging meta reinforcement learning that allows the learning agent to quickly adapt to a new environment with minimal interactions after being trained across a variety of similar tasks. We demonstrate that only a few episodes are required for the meta trained policy to adapt to a new environment and the proposed method is shown to achieve near-optimal performance and exhibit rapid convergence.