Reasoning graph-based reinforcement learning to cooperate mixed connected and autonomous traffic at unsignalized intersections

IF 7.6 1区工程技术 Q1 TRANSPORTATION SCIENCE & TECHNOLOGY

Transportation Research Part C-Emerging Technologies Pub Date : 2024-08-22 DOI:10.1016/j.trc.2024.104807

{"title":"Reasoning graph-based reinforcement learning to cooperate mixed connected and autonomous traffic at unsignalized intersections","authors":"","doi":"10.1016/j.trc.2024.104807","DOIUrl":null,"url":null,"abstract":"<div><p>Cooperation at unsignalized intersections in mixed traffic environments, where Connected and Autonomous Vehicles (CAVs) and Manually Driving Vehicles (MVs) coexist, holds promise for improving safety, efficiency, and energy savings. However, the mixed traffic at unsignalized intersections present huge challenges like MVs’ uncertainties, the chain reaction and diverse interactions. Following the thought of the situation-aware cooperation, this paper proposes a Reasoning Graph-based Reinforcement Learning (RGRL) method, which integrates a Graph Neural Network (GNN) based policy and an environment providing mixed traffic with uncertain behaviors. Firstly, it graphicly represents the observed scenario as a situation using the interaction graph with connected but uncertain (bi-directional) edges. The situation reasoning process is formulated as a Reasoning Graph-based Markov Decision Process which infers the vehicle sequence stage by stage so as to sequentially depict the entire situation. Then, a GNN-based policy is constructed, which uses Graph Convolution Networks (GCN) to capture the interrelated chain reactions and Graph Attentions Networks (GAT) to measure the attention of diverse interactions. Furthermore, an environment block is developed for training the policy, which provides trajectory generators for both CAVs and MVs. A reward function that considers social compliance, collision avoidance, efficiency and energy savings is also provided in this block. Finally, three Reinforcement Learning methods, D3QN, PPO and SAC, are implemented for comparative tests to explore the applicability and strength of the framework. The test results demonstrate that the D3QN outperformed the other two methods with a larger converged reward while maintaining a similar converged speed. Compared to multi-agent RL (MARL), the RGRL approach showed superior performance statistically, reduced the number of severe conflicts by 77.78–94.12 %. The RGRL reduced average and maximum travel times by 13.62–16.02 %, and fuel-consumption by 3.38–6.98 % in medium or high Market Penetration Rates (MPRs). Hardware-in-the-loop (HIL) and Vehicle-in-the-loop (VehIL) experiments were conducted to validate the model effectiveness.</p></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":null,"pages":null},"PeriodicalIF":7.6000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X24003280","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Cooperation at unsignalized intersections in mixed traffic environments, where Connected and Autonomous Vehicles (CAVs) and Manually Driving Vehicles (MVs) coexist, holds promise for improving safety, efficiency, and energy savings. However, the mixed traffic at unsignalized intersections present huge challenges like MVs’ uncertainties, the chain reaction and diverse interactions. Following the thought of the situation-aware cooperation, this paper proposes a Reasoning Graph-based Reinforcement Learning (RGRL) method, which integrates a Graph Neural Network (GNN) based policy and an environment providing mixed traffic with uncertain behaviors. Firstly, it graphicly represents the observed scenario as a situation using the interaction graph with connected but uncertain (bi-directional) edges. The situation reasoning process is formulated as a Reasoning Graph-based Markov Decision Process which infers the vehicle sequence stage by stage so as to sequentially depict the entire situation. Then, a GNN-based policy is constructed, which uses Graph Convolution Networks (GCN) to capture the interrelated chain reactions and Graph Attentions Networks (GAT) to measure the attention of diverse interactions. Furthermore, an environment block is developed for training the policy, which provides trajectory generators for both CAVs and MVs. A reward function that considers social compliance, collision avoidance, efficiency and energy savings is also provided in this block. Finally, three Reinforcement Learning methods, D3QN, PPO and SAC, are implemented for comparative tests to explore the applicability and strength of the framework. The test results demonstrate that the D3QN outperformed the other two methods with a larger converged reward while maintaining a similar converged speed. Compared to multi-agent RL (MARL), the RGRL approach showed superior performance statistically, reduced the number of severe conflicts by 77.78–94.12 %. The RGRL reduced average and maximum travel times by 13.62–16.02 %, and fuel-consumption by 3.38–6.98 % in medium or high Market Penetration Rates (MPRs). Hardware-in-the-loop (HIL) and Vehicle-in-the-loop (VehIL) experiments were conducted to validate the model effectiveness.

查看原文本刊更多论文

基于推理图的强化学习，在无信号交叉路口实现混合连接交通和自主交通的合作

在无信号交叉路口的混合交通环境中，互联和自动驾驶车辆（CAV）与人工驾驶车辆（MV）共存，这种合作有望提高安全性、效率并节约能源。然而，无信号交叉路口的混合交通带来了巨大的挑战，如 MV 的不确定性、连锁反应和各种互动。本文遵循情境感知合作的思想，提出了一种基于推理图的强化学习（RGRL）方法，该方法将基于图神经网络（GNN）的策略与具有不确定行为的混合交通环境相结合。首先，该方法将观察到的场景图形化，将其表示为具有连接但不确定（双向）边的交互图。情况推理过程被表述为基于推理图的马尔可夫决策过程，该过程逐级推断车辆序列，从而按顺序描述整个情况。然后，构建了基于 GNN 的策略，该策略使用图卷积网络（GCN）捕捉相互关联的连锁反应，并使用图注意力网络（GAT）衡量不同互动的注意力。此外，还开发了一个用于训练策略的环境模块，为 CAV 和 MV 提供轨迹生成器。该模块还提供了一个奖励函数，该函数考虑了社会合规性、避免碰撞、效率和节能等因素。最后，实施了三种强化学习方法（D3QN、PPO 和 SAC）进行对比测试，以探索该框架的适用性和优势。测试结果表明，D3QN 的表现优于其他两种方法，收敛奖励更大，同时保持了相似的收敛速度。与多代理 RL（MARL）相比，RGRL 方法在统计上显示出更优越的性能，减少了 77.78-94.12 % 的严重冲突次数。在中等或高市场渗透率（MPR）情况下，RGRL 将平均和最长行驶时间缩短了 13.62-16.02%，燃料消耗减少了 3.38-6.98%。为验证模型的有效性，进行了硬件在环（HIL）和车辆在环（VehIL）实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transportation Research Part C-Emerging Technologies 工程技术-运输科技

CiteScore

15.80

自引率

12.00%

发文量

332

审稿时长

64 days

期刊介绍： Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.