Yuchen Zhang, Xin Hu, Rong Chen, Zhili Zhang, Liquan Wang, Weidong Wang
{"title":"DVB-S2X卫星动态波束跳变:一种多目标深度强化学习方法","authors":"Yuchen Zhang, Xin Hu, Rong Chen, Zhili Zhang, Liquan Wang, Weidong Wang","doi":"10.1109/IUCC/DSCI/SmartCNS.2019.00056","DOIUrl":null,"url":null,"abstract":"Dynamic Beam Hopping (DBH) is a crucial technology for adapting to the flexibility of different service configurations in the multi-beam satellite communications market. The conventional beam hopping method, which ignores the intrinsic correlation between decisions, only obtains the optimal solution at the current time, while deep reinforcement learning (DRL) is a typical algorithm for solving sequential decision problems. Therefore, to deal with the DBH problem in the scenario of Differentiated Services (DIFFSERV), this paper designs a multiobjective deep reinforcement learning (MO-DRL) algorithm. Besides, as the demand for the number of beams increases, the complexity of system implementation increase significantly. This paper innovatively proposes a time division multi-action selectionmethod(TD-MASM) tosolvethecurseofdimensionality problem. Under the real condition, the MO-DRL algorithm with the low complexity can ensure the fairness of each cell, improve the throughput to about 5540Mbps, and reduce the delay to about 0.367ms. The simulation results show that when the GA is used to achieve similar effects, the complexity of GA is about 110 times that of the MO-DRL algorithm.","PeriodicalId":410905,"journal":{"name":"2019 IEEE International Conferences on Ubiquitous Computing & Communications (IUCC) and Data Science and Computational Intelligence (DSCI) and Smart Computing, Networking and Services (SmartCNS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Dynamic Beam Hopping for DVB-S2X Satellite: A Multi-Objective Deep Reinforcement Learning Approach\",\"authors\":\"Yuchen Zhang, Xin Hu, Rong Chen, Zhili Zhang, Liquan Wang, Weidong Wang\",\"doi\":\"10.1109/IUCC/DSCI/SmartCNS.2019.00056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic Beam Hopping (DBH) is a crucial technology for adapting to the flexibility of different service configurations in the multi-beam satellite communications market. The conventional beam hopping method, which ignores the intrinsic correlation between decisions, only obtains the optimal solution at the current time, while deep reinforcement learning (DRL) is a typical algorithm for solving sequential decision problems. Therefore, to deal with the DBH problem in the scenario of Differentiated Services (DIFFSERV), this paper designs a multiobjective deep reinforcement learning (MO-DRL) algorithm. Besides, as the demand for the number of beams increases, the complexity of system implementation increase significantly. This paper innovatively proposes a time division multi-action selectionmethod(TD-MASM) tosolvethecurseofdimensionality problem. Under the real condition, the MO-DRL algorithm with the low complexity can ensure the fairness of each cell, improve the throughput to about 5540Mbps, and reduce the delay to about 0.367ms. The simulation results show that when the GA is used to achieve similar effects, the complexity of GA is about 110 times that of the MO-DRL algorithm.\",\"PeriodicalId\":410905,\"journal\":{\"name\":\"2019 IEEE International Conferences on Ubiquitous Computing & Communications (IUCC) and Data Science and Computational Intelligence (DSCI) and Smart Computing, Networking and Services (SmartCNS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conferences on Ubiquitous Computing & Communications (IUCC) and Data Science and Computational Intelligence (DSCI) and Smart Computing, Networking and Services (SmartCNS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IUCC/DSCI/SmartCNS.2019.00056\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conferences on Ubiquitous Computing & Communications (IUCC) and Data Science and Computational Intelligence (DSCI) and Smart Computing, Networking and Services (SmartCNS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IUCC/DSCI/SmartCNS.2019.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic Beam Hopping for DVB-S2X Satellite: A Multi-Objective Deep Reinforcement Learning Approach
Dynamic Beam Hopping (DBH) is a crucial technology for adapting to the flexibility of different service configurations in the multi-beam satellite communications market. The conventional beam hopping method, which ignores the intrinsic correlation between decisions, only obtains the optimal solution at the current time, while deep reinforcement learning (DRL) is a typical algorithm for solving sequential decision problems. Therefore, to deal with the DBH problem in the scenario of Differentiated Services (DIFFSERV), this paper designs a multiobjective deep reinforcement learning (MO-DRL) algorithm. Besides, as the demand for the number of beams increases, the complexity of system implementation increase significantly. This paper innovatively proposes a time division multi-action selectionmethod(TD-MASM) tosolvethecurseofdimensionality problem. Under the real condition, the MO-DRL algorithm with the low complexity can ensure the fairness of each cell, improve the throughput to about 5540Mbps, and reduce the delay to about 0.367ms. The simulation results show that when the GA is used to achieve similar effects, the complexity of GA is about 110 times that of the MO-DRL algorithm.