{"title":"DRL路由器:基于分布式强化学习的可靠最短路径路由器","authors":"Hongliang Guo, Wenda Sheng, Chen Gao, Yaochu Jin","doi":"10.1109/MITS.2023.3265309","DOIUrl":null,"url":null,"abstract":"This article studies reliable shortest path (RSP) problems in stochastic transportation networks. The term reliability in the RSP literature has many definitions, e.g., 1) maximal stochastic on-time arrival probability, 2) minimal travel time with a high-confidence constraint, 3) minimal mean and standard deviation combination, and 4) minimal expected disutility. To the best of our knowledge, almost all state-of-the-art RSP solutions are designed to target one specific RSP objective, and it is very difficult, if not impossible, to adapt them to other RSP objectives. To bridge the gap, this article develops a distributional reinforcement learning (DRL)-based algorithm, namely, DRL-Router, which serves as a universal solution to the four aforementioned RSP problems. DRL-Router employs the DRL method to approximate the full travel time distribution of a given routing policy and then makes improvements with respect to the user-defined RSP objective through a generalized policy iteration scheme. DRL-Router is 1) universal, i.e., it is applicable to a variety of RSP objectives; 2) model free, i.e., it does not rely on well calibrated travel time distribution models; 3) it is adaptive with navigation objective changes; and 4) fast, i.e., it performs real-time decision making. Extensive experimental results and comparisons with baseline algorithms in various transportation networks justify both the accuracy and efficiency of DRL-Router.","PeriodicalId":48826,"journal":{"name":"IEEE Intelligent Transportation Systems Magazine","volume":"15 1","pages":"91-108"},"PeriodicalIF":4.3000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DRL Router: Distributional Reinforcement Learning-Based Router for Reliable Shortest Path Problems\",\"authors\":\"Hongliang Guo, Wenda Sheng, Chen Gao, Yaochu Jin\",\"doi\":\"10.1109/MITS.2023.3265309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article studies reliable shortest path (RSP) problems in stochastic transportation networks. The term reliability in the RSP literature has many definitions, e.g., 1) maximal stochastic on-time arrival probability, 2) minimal travel time with a high-confidence constraint, 3) minimal mean and standard deviation combination, and 4) minimal expected disutility. To the best of our knowledge, almost all state-of-the-art RSP solutions are designed to target one specific RSP objective, and it is very difficult, if not impossible, to adapt them to other RSP objectives. To bridge the gap, this article develops a distributional reinforcement learning (DRL)-based algorithm, namely, DRL-Router, which serves as a universal solution to the four aforementioned RSP problems. DRL-Router employs the DRL method to approximate the full travel time distribution of a given routing policy and then makes improvements with respect to the user-defined RSP objective through a generalized policy iteration scheme. DRL-Router is 1) universal, i.e., it is applicable to a variety of RSP objectives; 2) model free, i.e., it does not rely on well calibrated travel time distribution models; 3) it is adaptive with navigation objective changes; and 4) fast, i.e., it performs real-time decision making. Extensive experimental results and comparisons with baseline algorithms in various transportation networks justify both the accuracy and efficiency of DRL-Router.\",\"PeriodicalId\":48826,\"journal\":{\"name\":\"IEEE Intelligent Transportation Systems Magazine\",\"volume\":\"15 1\",\"pages\":\"91-108\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Intelligent Transportation Systems Magazine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1109/MITS.2023.3265309\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Intelligent Transportation Systems Magazine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/MITS.2023.3265309","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
This article studies reliable shortest path (RSP) problems in stochastic transportation networks. The term reliability in the RSP literature has many definitions, e.g., 1) maximal stochastic on-time arrival probability, 2) minimal travel time with a high-confidence constraint, 3) minimal mean and standard deviation combination, and 4) minimal expected disutility. To the best of our knowledge, almost all state-of-the-art RSP solutions are designed to target one specific RSP objective, and it is very difficult, if not impossible, to adapt them to other RSP objectives. To bridge the gap, this article develops a distributional reinforcement learning (DRL)-based algorithm, namely, DRL-Router, which serves as a universal solution to the four aforementioned RSP problems. DRL-Router employs the DRL method to approximate the full travel time distribution of a given routing policy and then makes improvements with respect to the user-defined RSP objective through a generalized policy iteration scheme. DRL-Router is 1) universal, i.e., it is applicable to a variety of RSP objectives; 2) model free, i.e., it does not rely on well calibrated travel time distribution models; 3) it is adaptive with navigation objective changes; and 4) fast, i.e., it performs real-time decision making. Extensive experimental results and comparisons with baseline algorithms in various transportation networks justify both the accuracy and efficiency of DRL-Router.
期刊介绍:
The IEEE Intelligent Transportation Systems Magazine (ITSM) publishes peer-reviewed articles that provide innovative research ideas and application results, report significant application case studies, and raise awareness of pressing research and application challenges in all areas of intelligent transportation systems. In contrast to the highly academic publication of the IEEE Transactions on Intelligent Transportation Systems, the ITS Magazine focuses on providing needed information to all members of IEEE ITS society, serving as a dissemination vehicle for ITS Society members and the others to learn the state of the art development and progress on ITS research and applications. High quality tutorials, surveys, successful implementations, technology reviews, lessons learned, policy and societal impacts, and ITS educational issues are published as well. The ITS Magazine also serves as an ideal media communication vehicle between the governing body of ITS society and its membership and promotes ITS community development and growth.