{"title":"时间敏感网络的强化学习辅助路由","authors":"Nurefsan Sertbas Bülbül, Mathias Fischer","doi":"10.1109/GLOBECOM48099.2022.10001630","DOIUrl":null,"url":null,"abstract":"Recent developments in real-time critical systems pave the way for different application scenarios such as Industrial IoT with various quality-of-service (QoS) requirements. The most critical common feature of such applications is that they are sensitive to latency and jitter. Thus, it is desired to perform flow placements strategically considering application requirements due to limited resource availability. In this paper, path computation for time-sensitive networks is investigated while satisfying individual end-to-end delay requirements of critical traffic. The problem is formulated as a mixed-integer linear program (MILP) which is NP-hard with exponentially increasing computational complexity as the network size expands. To solve the MILP with high efficiency, we propose a reinforcement learning (RL) algorithm that learns the best routing policy by continuously interacting with the network environment. The proposed learning algorithm determines the variable action set at each decision-making state and captures different execution times of the actions. The reward function in the proposed algorithm is carefully designed for meeting individual flow deadlines. Simulation results indicate that the proposed reinforcement learning algorithm can produce near-optimal flow allocations (close by ~1.5 %) and scales well even with large topology sizes.","PeriodicalId":313199,"journal":{"name":"GLOBECOM 2022 - 2022 IEEE Global Communications Conference","volume":"160 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Reinforcement Learning assisted Routing for Time Sensitive Networks\",\"authors\":\"Nurefsan Sertbas Bülbül, Mathias Fischer\",\"doi\":\"10.1109/GLOBECOM48099.2022.10001630\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent developments in real-time critical systems pave the way for different application scenarios such as Industrial IoT with various quality-of-service (QoS) requirements. The most critical common feature of such applications is that they are sensitive to latency and jitter. Thus, it is desired to perform flow placements strategically considering application requirements due to limited resource availability. In this paper, path computation for time-sensitive networks is investigated while satisfying individual end-to-end delay requirements of critical traffic. The problem is formulated as a mixed-integer linear program (MILP) which is NP-hard with exponentially increasing computational complexity as the network size expands. To solve the MILP with high efficiency, we propose a reinforcement learning (RL) algorithm that learns the best routing policy by continuously interacting with the network environment. The proposed learning algorithm determines the variable action set at each decision-making state and captures different execution times of the actions. The reward function in the proposed algorithm is carefully designed for meeting individual flow deadlines. Simulation results indicate that the proposed reinforcement learning algorithm can produce near-optimal flow allocations (close by ~1.5 %) and scales well even with large topology sizes.\",\"PeriodicalId\":313199,\"journal\":{\"name\":\"GLOBECOM 2022 - 2022 IEEE Global Communications Conference\",\"volume\":\"160 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GLOBECOM 2022 - 2022 IEEE Global Communications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GLOBECOM48099.2022.10001630\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GLOBECOM 2022 - 2022 IEEE Global Communications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GLOBECOM48099.2022.10001630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement Learning assisted Routing for Time Sensitive Networks
Recent developments in real-time critical systems pave the way for different application scenarios such as Industrial IoT with various quality-of-service (QoS) requirements. The most critical common feature of such applications is that they are sensitive to latency and jitter. Thus, it is desired to perform flow placements strategically considering application requirements due to limited resource availability. In this paper, path computation for time-sensitive networks is investigated while satisfying individual end-to-end delay requirements of critical traffic. The problem is formulated as a mixed-integer linear program (MILP) which is NP-hard with exponentially increasing computational complexity as the network size expands. To solve the MILP with high efficiency, we propose a reinforcement learning (RL) algorithm that learns the best routing policy by continuously interacting with the network environment. The proposed learning algorithm determines the variable action set at each decision-making state and captures different execution times of the actions. The reward function in the proposed algorithm is carefully designed for meeting individual flow deadlines. Simulation results indicate that the proposed reinforcement learning algorithm can produce near-optimal flow allocations (close by ~1.5 %) and scales well even with large topology sizes.