{"title":"支持noma的移动边缘计算的分布式强化学习","authors":"Zhong Yang, Yuanwei Liu, Yue Chen","doi":"10.1109/ICCWorkshops49005.2020.9145457","DOIUrl":null,"url":null,"abstract":"A novel non-orthogonal multiple access (NOMA) enabled cache-aided mobile edge computing (MEC) framework is proposed, for minimizing the sum energy consumption. The NOMA strategy enables mobile users to offload computation tasks to the access point (AP) simultaneously, which improves the spectrum efficiency. In this article, the considered resource allocation problem is formulated as a long-term reward maximization problem that involves a joint optimization of task offloading decision, computation resource allocation, and caching decision. To tackle this nontrivial problem, a single-agent Q-learning (SAQ-learning) algorithm is invoked to learn a long-term resource allocation strategy from historical experience. Moreover, a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) algorithm is proposed for task offloading decisions. More specifically, a BLA based action select scheme is proposed for the agents in MAQ-learning to select the optimal actions in every state. The proposed BLA based action selection scheme is instantaneously self-correcting, consequently, if the probabilities of two computing models (i.e., local computing and offloading computing) are not equal, the optimal action unveils eventually. Extensive simulations demonstrate that: 1) The proposed cache-aided NOMA MEC framework significantly outperforms the other representative benchmark schemes under various network setups. 2) The effectiveness of the proposed BAL-MAQ-learning algorithm is confirmed from the comparison with the results of conventional reinforcement learning algorithms.","PeriodicalId":254869,"journal":{"name":"2020 IEEE International Conference on Communications Workshops (ICC Workshops)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Distributed Reinforcement Learning for NOMA-Enabled Mobile Edge Computing\",\"authors\":\"Zhong Yang, Yuanwei Liu, Yue Chen\",\"doi\":\"10.1109/ICCWorkshops49005.2020.9145457\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel non-orthogonal multiple access (NOMA) enabled cache-aided mobile edge computing (MEC) framework is proposed, for minimizing the sum energy consumption. The NOMA strategy enables mobile users to offload computation tasks to the access point (AP) simultaneously, which improves the spectrum efficiency. In this article, the considered resource allocation problem is formulated as a long-term reward maximization problem that involves a joint optimization of task offloading decision, computation resource allocation, and caching decision. To tackle this nontrivial problem, a single-agent Q-learning (SAQ-learning) algorithm is invoked to learn a long-term resource allocation strategy from historical experience. Moreover, a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) algorithm is proposed for task offloading decisions. More specifically, a BLA based action select scheme is proposed for the agents in MAQ-learning to select the optimal actions in every state. The proposed BLA based action selection scheme is instantaneously self-correcting, consequently, if the probabilities of two computing models (i.e., local computing and offloading computing) are not equal, the optimal action unveils eventually. Extensive simulations demonstrate that: 1) The proposed cache-aided NOMA MEC framework significantly outperforms the other representative benchmark schemes under various network setups. 2) The effectiveness of the proposed BAL-MAQ-learning algorithm is confirmed from the comparison with the results of conventional reinforcement learning algorithms.\",\"PeriodicalId\":254869,\"journal\":{\"name\":\"2020 IEEE International Conference on Communications Workshops (ICC Workshops)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Communications Workshops (ICC Workshops)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCWorkshops49005.2020.9145457\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Communications Workshops (ICC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWorkshops49005.2020.9145457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed Reinforcement Learning for NOMA-Enabled Mobile Edge Computing
A novel non-orthogonal multiple access (NOMA) enabled cache-aided mobile edge computing (MEC) framework is proposed, for minimizing the sum energy consumption. The NOMA strategy enables mobile users to offload computation tasks to the access point (AP) simultaneously, which improves the spectrum efficiency. In this article, the considered resource allocation problem is formulated as a long-term reward maximization problem that involves a joint optimization of task offloading decision, computation resource allocation, and caching decision. To tackle this nontrivial problem, a single-agent Q-learning (SAQ-learning) algorithm is invoked to learn a long-term resource allocation strategy from historical experience. Moreover, a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) algorithm is proposed for task offloading decisions. More specifically, a BLA based action select scheme is proposed for the agents in MAQ-learning to select the optimal actions in every state. The proposed BLA based action selection scheme is instantaneously self-correcting, consequently, if the probabilities of two computing models (i.e., local computing and offloading computing) are not equal, the optimal action unveils eventually. Extensive simulations demonstrate that: 1) The proposed cache-aided NOMA MEC framework significantly outperforms the other representative benchmark schemes under various network setups. 2) The effectiveness of the proposed BAL-MAQ-learning algorithm is confirmed from the comparison with the results of conventional reinforcement learning algorithms.