Improved multi-agent deep reinforcement learning-based integrated control for mixed traffic flow in a freeway corridor with multiple bottlenecks

IF 7.6 1区工程技术 Q1 TRANSPORTATION SCIENCE & TECHNOLOGY

Transportation Research Part C-Emerging Technologies Pub Date : 2025-03-05 DOI:10.1016/j.trc.2025.105077

Lei Han , Lun Zhang , Haixiao Pan

{"title":"Improved multi-agent deep reinforcement learning-based integrated control for mixed traffic flow in a freeway corridor with multiple bottlenecks","authors":"Lei Han , Lun Zhang , Haixiao Pan","doi":"10.1016/j.trc.2025.105077","DOIUrl":null,"url":null,"abstract":"<div><div>A major challenging issue related to the emerging mixed traffic flow system, composed of Connected and Automated Vehicles (CAVs) and Human-Driven Vehicles (HDVs), is the lack of adequate traffic control measures, especially in a large freeway corridor with multiple bottlenecks. Multi-agent deep reinforcement learning exhibits significant advantages, such as fast response, high flexibility, strong adaptability, low computational burden, and collaborative optimization. These features enable it to achieve superior efficiency and robustness in handling dynamically changing traffic environments and large-scale traffic control problems. Inspired by this, we propose a novel Integrated Traffic Control (ITC) strategy based on an Improved Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (IPMATD3) algorithm in the mixed traffic environment (abbreviated as IPMATD3-based ITC). Specifically, the proposed IPMATD3-based ITC approach seeks to coordinate multiple Ramp Metering (RM) and Variable Speed Limit (VSL) controllers along a freeway corridor, with the objectives of improving traffic mobility and efficiency, enhancing safety, and reducing emissions. The proposed method utilized a centralized training with decentralized execution paradigm to learn the joint actions of all traffic controllers in a high-dimensional state and action spaces. A hybrid reward function is developed by synchronously considering the above objectives to optimize traffic control performance. Then, the rank-based prioritized experience replay mechanism is incorporated into the conventional MATD3 algorithm to improve learning efficiency. A real-world freeway corridor is selected to test the proposed control method. Moreover, its performance is compared with the several state-of-the-art methods. The simulation results demonstrate that the proposed method achieves remarkable control performance at a 10% CAV Penetration Rate (PR), effectively reducing the spatiotemporal extent of freeway traffic congestion. The proposed method outperforms other approaches in improving freeway traffic efficiency, mobility, safety, and environmental sustainability. Increasing the PR can improve the performance of various methods and benefit traffic operations. However, when the PR reaches higher levels, the marginal benefits of further increases become less pronounced.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"174 ","pages":"Article 105077"},"PeriodicalIF":7.6000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25000816","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

A major challenging issue related to the emerging mixed traffic flow system, composed of Connected and Automated Vehicles (CAVs) and Human-Driven Vehicles (HDVs), is the lack of adequate traffic control measures, especially in a large freeway corridor with multiple bottlenecks. Multi-agent deep reinforcement learning exhibits significant advantages, such as fast response, high flexibility, strong adaptability, low computational burden, and collaborative optimization. These features enable it to achieve superior efficiency and robustness in handling dynamically changing traffic environments and large-scale traffic control problems. Inspired by this, we propose a novel Integrated Traffic Control (ITC) strategy based on an Improved Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (IPMATD3) algorithm in the mixed traffic environment (abbreviated as IPMATD3-based ITC). Specifically, the proposed IPMATD3-based ITC approach seeks to coordinate multiple Ramp Metering (RM) and Variable Speed Limit (VSL) controllers along a freeway corridor, with the objectives of improving traffic mobility and efficiency, enhancing safety, and reducing emissions. The proposed method utilized a centralized training with decentralized execution paradigm to learn the joint actions of all traffic controllers in a high-dimensional state and action spaces. A hybrid reward function is developed by synchronously considering the above objectives to optimize traffic control performance. Then, the rank-based prioritized experience replay mechanism is incorporated into the conventional MATD3 algorithm to improve learning efficiency. A real-world freeway corridor is selected to test the proposed control method. Moreover, its performance is compared with the several state-of-the-art methods. The simulation results demonstrate that the proposed method achieves remarkable control performance at a 10% CAV Penetration Rate (PR), effectively reducing the spatiotemporal extent of freeway traffic congestion. The proposed method outperforms other approaches in improving freeway traffic efficiency, mobility, safety, and environmental sustainability. Increasing the PR can improve the performance of various methods and benefit traffic operations. However, when the PR reaches higher levels, the marginal benefits of further increases become less pronounced.

查看原文本刊更多论文

基于改进多智能体深度强化学习的高速公路走廊多瓶颈混合交通流综合控制

与新兴的混合交通流系统相关的一个主要挑战问题是缺乏足够的交通控制措施，特别是在具有多个瓶颈的大型高速公路走廊中。混合交通流系统由联网和自动驾驶车辆（cav）和人类驾驶车辆（HDVs）组成。多智能体深度强化学习具有响应快、灵活性高、适应性强、计算量小、协同优化等显著优势。这些特点使其在处理动态变化的交通环境和大规模交通控制问题时具有优越的效率和鲁棒性。受此启发，我们提出了一种基于改进的多智能体双延迟深度确定性策略梯度（IPMATD3）算法的混合交通环境下的集成交通控制（ITC）策略（简称基于IPMATD3的ITC）。具体而言，拟议的基于ipmatd3的ITC方法旨在协调高速公路走廊上的多个匝道计量（RM）和可变速度限制（VSL）控制器，以提高交通机动性和效率，提高安全性并减少排放。该方法采用集中训练和分散执行的模式，在高维状态和动作空间中学习所有交通控制器的联合动作。同时考虑上述目标，建立了一种混合奖励函数，以优化交通控制性能。然后，在传统的MATD3算法中加入基于排名的优先体验重放机制，提高学习效率。选择了一个真实的高速公路走廊来测试所提出的控制方法。此外，将其性能与几种最先进的方法进行了比较。仿真结果表明，该方法在10%的自动驾驶汽车渗透率（PR）下取得了显著的控制效果，有效降低了高速公路交通拥堵的时空程度。该方法在提高高速公路交通效率、机动性、安全性和环境可持续性方面优于其他方法。提高PR可以提高各种方法的性能，有利于交通运营。然而，当PR达到较高水平时，进一步增加的边际效益就变得不那么明显了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transportation Research Part C-Emerging Technologies 工程技术-运输科技

CiteScore

15.80

自引率

12.00%

发文量

332

审稿时长

64 days

期刊介绍： Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.