{"title":"部分可观察NFV环境下基于深度强化学习的异构流调度","authors":"Chun Jen Lin, Yan Luo, Liang-Min Wang","doi":"10.1109/NaNA53684.2021.00081","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement Learning (DRL) has yielded proficient controllers for complex tasks. DRL trains machine learning models for decision making to maximize rewards in uncertain environments such as network function virtualization (NFV). However, when facing limited information, agents often have difficulties making decisions at some decision point. In a real-world NFV environment, we may have incomplete information about network flow patterns. Compared with complete information feedback, it increases the difficulty to predict an optimal policy since important state information is missing. In this paper, we formulate a Partially Observable Markov Decision Process (POMDP) with a partially unknown NFV system. To address the shortcomings in real-world NFV, we conduct an extensive simulation to investigate the effects of adding recurrency to a Proximal Policy optimization (PPO2) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM or adding stacked frames as input. The results show that RL based schedulers using stacking a history of frames in the PPO2’s input layer can easily adapt at evaluation time if the quality of observations changes.","PeriodicalId":414672,"journal":{"name":"2021 International Conference on Networking and Network Applications (NaNA)","volume":"393 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heterogeneous Flow Scheduling using Deep Reinforcement Learning in Partially Observable NFV Environment\",\"authors\":\"Chun Jen Lin, Yan Luo, Liang-Min Wang\",\"doi\":\"10.1109/NaNA53684.2021.00081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Reinforcement Learning (DRL) has yielded proficient controllers for complex tasks. DRL trains machine learning models for decision making to maximize rewards in uncertain environments such as network function virtualization (NFV). However, when facing limited information, agents often have difficulties making decisions at some decision point. In a real-world NFV environment, we may have incomplete information about network flow patterns. Compared with complete information feedback, it increases the difficulty to predict an optimal policy since important state information is missing. In this paper, we formulate a Partially Observable Markov Decision Process (POMDP) with a partially unknown NFV system. To address the shortcomings in real-world NFV, we conduct an extensive simulation to investigate the effects of adding recurrency to a Proximal Policy optimization (PPO2) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM or adding stacked frames as input. The results show that RL based schedulers using stacking a history of frames in the PPO2’s input layer can easily adapt at evaluation time if the quality of observations changes.\",\"PeriodicalId\":414672,\"journal\":{\"name\":\"2021 International Conference on Networking and Network Applications (NaNA)\",\"volume\":\"393 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Networking and Network Applications (NaNA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NaNA53684.2021.00081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Networking and Network Applications (NaNA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NaNA53684.2021.00081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Heterogeneous Flow Scheduling using Deep Reinforcement Learning in Partially Observable NFV Environment
Deep Reinforcement Learning (DRL) has yielded proficient controllers for complex tasks. DRL trains machine learning models for decision making to maximize rewards in uncertain environments such as network function virtualization (NFV). However, when facing limited information, agents often have difficulties making decisions at some decision point. In a real-world NFV environment, we may have incomplete information about network flow patterns. Compared with complete information feedback, it increases the difficulty to predict an optimal policy since important state information is missing. In this paper, we formulate a Partially Observable Markov Decision Process (POMDP) with a partially unknown NFV system. To address the shortcomings in real-world NFV, we conduct an extensive simulation to investigate the effects of adding recurrency to a Proximal Policy optimization (PPO2) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM or adding stacked frames as input. The results show that RL based schedulers using stacking a history of frames in the PPO2’s input layer can easily adapt at evaluation time if the quality of observations changes.