{"title":"基于多智能体模型的无人机网络弹道设计与功率控制强化学习","authors":"Shiyang Zhou, Yufan Cheng, Xia Lei","doi":"10.1109/ictc55111.2022.9778837","DOIUrl":null,"url":null,"abstract":"Unmanned aerial vehicles (UAVs) serving as aerial base stations is a promising technology for wireless communications. This paper formulates a joint optimization problem of UAV trajectory design and power control to minimize the power consumption when satisfying users’ QoS requirements in a downlink transmission. Firstly, a multi-agent deep deterministic policy gradient (MADDPG) scheme with centralized training and decentralized execution is proposed to improve the overall performance of the UAVs in cooperation. Secondly, model value expansion (MVE) is incorporated into the model-free MADDPG scheme. By imaging future transitions, the proposed multiagent model value expansion deep deterministic policy gradient (MA-MVE-DDPG) algorithm generates more experiences, and thus accelerates training. Simulation results have demonstrated that our proposed MA-MVE-DDPG algorithm achieves better performance and converges faster than benchmark schemes.","PeriodicalId":123022,"journal":{"name":"2022 3rd Information Communication Technologies Conference (ICTC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Agent Model-Based Reinforcement Learning for Trajectory Design and Power Control in UAV-Enabled Networks\",\"authors\":\"Shiyang Zhou, Yufan Cheng, Xia Lei\",\"doi\":\"10.1109/ictc55111.2022.9778837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unmanned aerial vehicles (UAVs) serving as aerial base stations is a promising technology for wireless communications. This paper formulates a joint optimization problem of UAV trajectory design and power control to minimize the power consumption when satisfying users’ QoS requirements in a downlink transmission. Firstly, a multi-agent deep deterministic policy gradient (MADDPG) scheme with centralized training and decentralized execution is proposed to improve the overall performance of the UAVs in cooperation. Secondly, model value expansion (MVE) is incorporated into the model-free MADDPG scheme. By imaging future transitions, the proposed multiagent model value expansion deep deterministic policy gradient (MA-MVE-DDPG) algorithm generates more experiences, and thus accelerates training. Simulation results have demonstrated that our proposed MA-MVE-DDPG algorithm achieves better performance and converges faster than benchmark schemes.\",\"PeriodicalId\":123022,\"journal\":{\"name\":\"2022 3rd Information Communication Technologies Conference (ICTC)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 3rd Information Communication Technologies Conference (ICTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ictc55111.2022.9778837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 3rd Information Communication Technologies Conference (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ictc55111.2022.9778837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Agent Model-Based Reinforcement Learning for Trajectory Design and Power Control in UAV-Enabled Networks
Unmanned aerial vehicles (UAVs) serving as aerial base stations is a promising technology for wireless communications. This paper formulates a joint optimization problem of UAV trajectory design and power control to minimize the power consumption when satisfying users’ QoS requirements in a downlink transmission. Firstly, a multi-agent deep deterministic policy gradient (MADDPG) scheme with centralized training and decentralized execution is proposed to improve the overall performance of the UAVs in cooperation. Secondly, model value expansion (MVE) is incorporated into the model-free MADDPG scheme. By imaging future transitions, the proposed multiagent model value expansion deep deterministic policy gradient (MA-MVE-DDPG) algorithm generates more experiences, and thus accelerates training. Simulation results have demonstrated that our proposed MA-MVE-DDPG algorithm achieves better performance and converges faster than benchmark schemes.