{"title":"基于推理图的强化学习,在无信号交叉路口实现混合连接交通和自主交通的合作","authors":"Donghao Zhou, Peng Hang, Jian Sun","doi":"10.1016/j.trc.2024.104807","DOIUrl":null,"url":null,"abstract":"<div><p>Cooperation at unsignalized intersections in mixed traffic environments, where Connected and Autonomous Vehicles (CAVs) and Manually Driving Vehicles (MVs) coexist, holds promise for improving safety, efficiency, and energy savings. However, the mixed traffic at unsignalized intersections present huge challenges like MVs’ uncertainties, the chain reaction and diverse interactions. Following the thought of the situation-aware cooperation, this paper proposes a Reasoning Graph-based Reinforcement Learning (RGRL) method, which integrates a Graph Neural Network (GNN) based policy and an environment providing mixed traffic with uncertain behaviors. Firstly, it graphicly represents the observed scenario as a situation using the interaction graph with connected but uncertain (bi-directional) edges. The situation reasoning process is formulated as a Reasoning Graph-based Markov Decision Process which infers the vehicle sequence stage by stage so as to sequentially depict the entire situation. Then, a GNN-based policy is constructed, which uses Graph Convolution Networks (GCN) to capture the interrelated chain reactions and Graph Attentions Networks (GAT) to measure the attention of diverse interactions. Furthermore, an environment block is developed for training the policy, which provides trajectory generators for both CAVs and MVs. A reward function that considers social compliance, collision avoidance, efficiency and energy savings is also provided in this block. Finally, three Reinforcement Learning methods, D3QN, PPO and SAC, are implemented for comparative tests to explore the applicability and strength of the framework. The test results demonstrate that the D3QN outperformed the other two methods with a larger converged reward while maintaining a similar converged speed. Compared to multi-agent RL (MARL), the RGRL approach showed superior performance statistically, reduced the number of severe conflicts by 77.78–94.12 %. The RGRL reduced average and maximum travel times by 13.62–16.02 %, and fuel-consumption by 3.38–6.98 % in medium or high Market Penetration Rates (MPRs). Hardware-in-the-loop (HIL) and Vehicle-in-the-loop (VehIL) experiments were conducted to validate the model effectiveness.</p></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":null,"pages":null},"PeriodicalIF":7.6000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reasoning graph-based reinforcement learning to cooperate mixed connected and autonomous traffic at unsignalized intersections\",\"authors\":\"Donghao Zhou, Peng Hang, Jian Sun\",\"doi\":\"10.1016/j.trc.2024.104807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Cooperation at unsignalized intersections in mixed traffic environments, where Connected and Autonomous Vehicles (CAVs) and Manually Driving Vehicles (MVs) coexist, holds promise for improving safety, efficiency, and energy savings. However, the mixed traffic at unsignalized intersections present huge challenges like MVs’ uncertainties, the chain reaction and diverse interactions. Following the thought of the situation-aware cooperation, this paper proposes a Reasoning Graph-based Reinforcement Learning (RGRL) method, which integrates a Graph Neural Network (GNN) based policy and an environment providing mixed traffic with uncertain behaviors. Firstly, it graphicly represents the observed scenario as a situation using the interaction graph with connected but uncertain (bi-directional) edges. The situation reasoning process is formulated as a Reasoning Graph-based Markov Decision Process which infers the vehicle sequence stage by stage so as to sequentially depict the entire situation. Then, a GNN-based policy is constructed, which uses Graph Convolution Networks (GCN) to capture the interrelated chain reactions and Graph Attentions Networks (GAT) to measure the attention of diverse interactions. Furthermore, an environment block is developed for training the policy, which provides trajectory generators for both CAVs and MVs. A reward function that considers social compliance, collision avoidance, efficiency and energy savings is also provided in this block. Finally, three Reinforcement Learning methods, D3QN, PPO and SAC, are implemented for comparative tests to explore the applicability and strength of the framework. The test results demonstrate that the D3QN outperformed the other two methods with a larger converged reward while maintaining a similar converged speed. Compared to multi-agent RL (MARL), the RGRL approach showed superior performance statistically, reduced the number of severe conflicts by 77.78–94.12 %. The RGRL reduced average and maximum travel times by 13.62–16.02 %, and fuel-consumption by 3.38–6.98 % in medium or high Market Penetration Rates (MPRs). Hardware-in-the-loop (HIL) and Vehicle-in-the-loop (VehIL) experiments were conducted to validate the model effectiveness.</p></div>\",\"PeriodicalId\":54417,\"journal\":{\"name\":\"Transportation Research Part C-Emerging Technologies\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part C-Emerging Technologies\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968090X24003280\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X24003280","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
Reasoning graph-based reinforcement learning to cooperate mixed connected and autonomous traffic at unsignalized intersections
Cooperation at unsignalized intersections in mixed traffic environments, where Connected and Autonomous Vehicles (CAVs) and Manually Driving Vehicles (MVs) coexist, holds promise for improving safety, efficiency, and energy savings. However, the mixed traffic at unsignalized intersections present huge challenges like MVs’ uncertainties, the chain reaction and diverse interactions. Following the thought of the situation-aware cooperation, this paper proposes a Reasoning Graph-based Reinforcement Learning (RGRL) method, which integrates a Graph Neural Network (GNN) based policy and an environment providing mixed traffic with uncertain behaviors. Firstly, it graphicly represents the observed scenario as a situation using the interaction graph with connected but uncertain (bi-directional) edges. The situation reasoning process is formulated as a Reasoning Graph-based Markov Decision Process which infers the vehicle sequence stage by stage so as to sequentially depict the entire situation. Then, a GNN-based policy is constructed, which uses Graph Convolution Networks (GCN) to capture the interrelated chain reactions and Graph Attentions Networks (GAT) to measure the attention of diverse interactions. Furthermore, an environment block is developed for training the policy, which provides trajectory generators for both CAVs and MVs. A reward function that considers social compliance, collision avoidance, efficiency and energy savings is also provided in this block. Finally, three Reinforcement Learning methods, D3QN, PPO and SAC, are implemented for comparative tests to explore the applicability and strength of the framework. The test results demonstrate that the D3QN outperformed the other two methods with a larger converged reward while maintaining a similar converged speed. Compared to multi-agent RL (MARL), the RGRL approach showed superior performance statistically, reduced the number of severe conflicts by 77.78–94.12 %. The RGRL reduced average and maximum travel times by 13.62–16.02 %, and fuel-consumption by 3.38–6.98 % in medium or high Market Penetration Rates (MPRs). Hardware-in-the-loop (HIL) and Vehicle-in-the-loop (VehIL) experiments were conducted to validate the model effectiveness.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.