{"title":"基于强化学习的无线资源分配","authors":"Rui Wang","doi":"10.1049/PBTE081E_CH11","DOIUrl":null,"url":null,"abstract":"In this chapter, we shall focus on the formulation of radio resource management via Markov decision process (MDP). Convex optimization has been widely used in the RRM within a short-time duration, where the wireless channel is assumed to be quasi-static. These problems are usually referred to as deterministic optimization problems. On the other hand, MDP is an elegant and powerful tool to handle the resource optimization of wireless systems in a longer timescale, where the random transitions of system and channel status are considered.These problems are usually referred to as stochastic optimization problems. Particularly, MDP is suitable for the joint optimization between physical and media-access control (MAC) layers. Based on MDP, reinforcement learning is a practical method to address the optimization without a priori knowledge of system statistics. In this chapter, we shall first introduce some basics on stochastic approximation, which serves as one basis of reinforcement learning, and then demonstrate the MDP formulations of RRM via some case studies, which require the knowledge of system statistics. Finally, some approaches of reinforcement learning (e.g., Q-learning) are introduced to address the practical issue of unknown system statistics.","PeriodicalId":358911,"journal":{"name":"Applications of Machine Learning in Wireless Communications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement-learning-based wireless resource allocation\",\"authors\":\"Rui Wang\",\"doi\":\"10.1049/PBTE081E_CH11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this chapter, we shall focus on the formulation of radio resource management via Markov decision process (MDP). Convex optimization has been widely used in the RRM within a short-time duration, where the wireless channel is assumed to be quasi-static. These problems are usually referred to as deterministic optimization problems. On the other hand, MDP is an elegant and powerful tool to handle the resource optimization of wireless systems in a longer timescale, where the random transitions of system and channel status are considered.These problems are usually referred to as stochastic optimization problems. Particularly, MDP is suitable for the joint optimization between physical and media-access control (MAC) layers. Based on MDP, reinforcement learning is a practical method to address the optimization without a priori knowledge of system statistics. In this chapter, we shall first introduce some basics on stochastic approximation, which serves as one basis of reinforcement learning, and then demonstrate the MDP formulations of RRM via some case studies, which require the knowledge of system statistics. Finally, some approaches of reinforcement learning (e.g., Q-learning) are introduced to address the practical issue of unknown system statistics.\",\"PeriodicalId\":358911,\"journal\":{\"name\":\"Applications of Machine Learning in Wireless Communications\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applications of Machine Learning in Wireless Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1049/PBTE081E_CH11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applications of Machine Learning in Wireless Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/PBTE081E_CH11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this chapter, we shall focus on the formulation of radio resource management via Markov decision process (MDP). Convex optimization has been widely used in the RRM within a short-time duration, where the wireless channel is assumed to be quasi-static. These problems are usually referred to as deterministic optimization problems. On the other hand, MDP is an elegant and powerful tool to handle the resource optimization of wireless systems in a longer timescale, where the random transitions of system and channel status are considered.These problems are usually referred to as stochastic optimization problems. Particularly, MDP is suitable for the joint optimization between physical and media-access control (MAC) layers. Based on MDP, reinforcement learning is a practical method to address the optimization without a priori knowledge of system statistics. In this chapter, we shall first introduce some basics on stochastic approximation, which serves as one basis of reinforcement learning, and then demonstrate the MDP formulations of RRM via some case studies, which require the knowledge of system statistics. Finally, some approaches of reinforcement learning (e.g., Q-learning) are introduced to address the practical issue of unknown system statistics.