{"title":"理解B5G网络管理中q -学习代理的探索与利用","authors":"Sayantini Majumdar, R. Trivisonno, G. Carle","doi":"10.1109/GCWkshps52748.2021.9682129","DOIUrl":null,"url":null,"abstract":"Auto-scaling is a lifecycle management approach that automatically scales resources (CPU, memory etc.) based on incoming load to optimize resource utilization. Centralized orchestration, although optimal, comes at the cost of high signaling overhead. Alternatively, decentralized RL-based approaches such as Q-Learning (QL) are envisaged to be more suitable for the strict latency and overhead requirements of B5G/6G use cases, while also minimizing the number of resource allocation conflicts encountered in a distributed setting. Before QL agents can take optimal auto-scaling decisions, they need to explore or evaluate their actions based on the feedback they receive from the environment. The faster they learn, the sooner they could begin to exploit their knowledge. However, it is not clear when these agents have explored long enough to start taking management actions. This paper focuses on understanding when the exploration should end such that agents may start exploiting built knowledge. In our approach, we posit that the knowledge accrued by the agents in their Q-tables should indicate whether to explore or exploit. Hence, we conceive Knowledge Indicators (KIs) derived from their Q-tables. These KIs enable agents to learn autonomously, thereby enabling adjustment of the exploration parameter epsilon in the epsilon-greedy approach. Convergence results and corresponding impact on the system performance validate the proposed approach. This work has the potential to speed up the convergence of QL agents, thereby providing critical hints to operators targeting live deployments of B5G/6G decentralized network management.","PeriodicalId":6802,"journal":{"name":"2021 IEEE Globecom Workshops (GC Wkshps)","volume":"25 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Understanding Exploration and Exploitation of Q-Learning Agents in B5G Network Management\",\"authors\":\"Sayantini Majumdar, R. Trivisonno, G. Carle\",\"doi\":\"10.1109/GCWkshps52748.2021.9682129\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Auto-scaling is a lifecycle management approach that automatically scales resources (CPU, memory etc.) based on incoming load to optimize resource utilization. Centralized orchestration, although optimal, comes at the cost of high signaling overhead. Alternatively, decentralized RL-based approaches such as Q-Learning (QL) are envisaged to be more suitable for the strict latency and overhead requirements of B5G/6G use cases, while also minimizing the number of resource allocation conflicts encountered in a distributed setting. Before QL agents can take optimal auto-scaling decisions, they need to explore or evaluate their actions based on the feedback they receive from the environment. The faster they learn, the sooner they could begin to exploit their knowledge. However, it is not clear when these agents have explored long enough to start taking management actions. This paper focuses on understanding when the exploration should end such that agents may start exploiting built knowledge. In our approach, we posit that the knowledge accrued by the agents in their Q-tables should indicate whether to explore or exploit. Hence, we conceive Knowledge Indicators (KIs) derived from their Q-tables. These KIs enable agents to learn autonomously, thereby enabling adjustment of the exploration parameter epsilon in the epsilon-greedy approach. Convergence results and corresponding impact on the system performance validate the proposed approach. This work has the potential to speed up the convergence of QL agents, thereby providing critical hints to operators targeting live deployments of B5G/6G decentralized network management.\",\"PeriodicalId\":6802,\"journal\":{\"name\":\"2021 IEEE Globecom Workshops (GC Wkshps)\",\"volume\":\"25 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Globecom Workshops (GC Wkshps)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GCWkshps52748.2021.9682129\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Globecom Workshops (GC Wkshps)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCWkshps52748.2021.9682129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Understanding Exploration and Exploitation of Q-Learning Agents in B5G Network Management
Auto-scaling is a lifecycle management approach that automatically scales resources (CPU, memory etc.) based on incoming load to optimize resource utilization. Centralized orchestration, although optimal, comes at the cost of high signaling overhead. Alternatively, decentralized RL-based approaches such as Q-Learning (QL) are envisaged to be more suitable for the strict latency and overhead requirements of B5G/6G use cases, while also minimizing the number of resource allocation conflicts encountered in a distributed setting. Before QL agents can take optimal auto-scaling decisions, they need to explore or evaluate their actions based on the feedback they receive from the environment. The faster they learn, the sooner they could begin to exploit their knowledge. However, it is not clear when these agents have explored long enough to start taking management actions. This paper focuses on understanding when the exploration should end such that agents may start exploiting built knowledge. In our approach, we posit that the knowledge accrued by the agents in their Q-tables should indicate whether to explore or exploit. Hence, we conceive Knowledge Indicators (KIs) derived from their Q-tables. These KIs enable agents to learn autonomously, thereby enabling adjustment of the exploration parameter epsilon in the epsilon-greedy approach. Convergence results and corresponding impact on the system performance validate the proposed approach. This work has the potential to speed up the convergence of QL agents, thereby providing critical hints to operators targeting live deployments of B5G/6G decentralized network management.