MADDPG-GST for coordinated variable speed limit and ramp metering: A hybrid action deep reinforcement learning approach to bottleneck congestion mitigation

IF 3.1 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY

Physica A: Statistical Mechanics and its Applications Pub Date : 2025-10-14 DOI:10.1016/j.physa.2025.131045

Tun Qiu, Pan Liu, Zhibin Li, Chengcheng Xu, Kailai Qiu, Shunchao Wang

{"title":"MADDPG-GST for coordinated variable speed limit and ramp metering: A hybrid action deep reinforcement learning approach to bottleneck congestion mitigation","authors":"Tun Qiu, Pan Liu, Zhibin Li, Chengcheng Xu, Kailai Qiu, Shunchao Wang","doi":"10.1016/j.physa.2025.131045","DOIUrl":null,"url":null,"abstract":"<div><div>Expressway merging bottlenecks are major sources of traffic congestion, where insufficient coordination among multiple traffic streams leads to severe flow disruptions. Although variable speed limits (VSL) and ramp metering (RM) are commonly used to mitigate congestion, their independent operation and mismatched control scopes often result in suboptimal outcomes. To address this, this study proposes a coordinated VSL–RM strategy based on multi-agent deep reinforcement learning. The control task is modeled as a Markov Decision Process (MDP), allowing joint policy learning between decentralized VSL and RM agents. A customized Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is employed to dynamically optimize coordination policies. To bridge the gap between discrete VSL and continuous RM action spaces, the Gumbel-Softmax Trick (GST) is integrated into the learning process for differentiable hybrid action optimization. Additionally, a transfer learning mechanism is incorporated to ensure efficient policy adaptation across diverse traffic scenarios. Simulation results under varying demand levels show that the proposed strategy achieves 7.3 %–34.1 % improvements in traffic efficiency and stability compared to traditional methods. It also demonstrates strong transferability, reducing retraining time by up to 63.7 % and traffic delays by up to 62.7 %, while maintaining robust control under overspeed disturbances and control lag conditions.</div></div>","PeriodicalId":20152,"journal":{"name":"Physica A: Statistical Mechanics and its Applications","volume":"680 ","pages":"Article 131045"},"PeriodicalIF":3.1000,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physica A: Statistical Mechanics and its Applications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378437125006971","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Expressway merging bottlenecks are major sources of traffic congestion, where insufficient coordination among multiple traffic streams leads to severe flow disruptions. Although variable speed limits (VSL) and ramp metering (RM) are commonly used to mitigate congestion, their independent operation and mismatched control scopes often result in suboptimal outcomes. To address this, this study proposes a coordinated VSL–RM strategy based on multi-agent deep reinforcement learning. The control task is modeled as a Markov Decision Process (MDP), allowing joint policy learning between decentralized VSL and RM agents. A customized Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is employed to dynamically optimize coordination policies. To bridge the gap between discrete VSL and continuous RM action spaces, the Gumbel-Softmax Trick (GST) is integrated into the learning process for differentiable hybrid action optimization. Additionally, a transfer learning mechanism is incorporated to ensure efficient policy adaptation across diverse traffic scenarios. Simulation results under varying demand levels show that the proposed strategy achieves 7.3 %–34.1 % improvements in traffic efficiency and stability compared to traditional methods. It also demonstrates strong transferability, reducing retraining time by up to 63.7 % and traffic delays by up to 62.7 %, while maintaining robust control under overspeed disturbances and control lag conditions.

查看原文本刊更多论文

协调可变限速和匝道计量的madpg - gst：一种缓解瓶颈拥堵的混合行动深度强化学习方法

高速公路合流瓶颈是造成交通拥堵的主要原因，多个交通流之间的协调不足会导致严重的交通中断。虽然可变速度限制（VSL）和匝道计量（RM）通常用于缓解拥塞，但它们的独立操作和不匹配的控制范围经常导致次优结果。为了解决这个问题，本研究提出了一种基于多智能体深度强化学习的协调VSL-RM策略。控制任务被建模为马尔可夫决策过程（MDP），允许分散的VSL和RM代理之间的联合策略学习。采用自定义多智能体深度确定性策略梯度（madpg）算法对协调策略进行动态优化。为了弥合离散VSL和连续RM动作空间之间的差距，Gumbel-Softmax技巧（GST）被集成到可微分混合动作优化的学习过程中。此外，还结合了迁移学习机制，以确保在不同的交通场景中有效地适应策略。在不同需求水平下的仿真结果表明，与传统方法相比，该策略的交通效率和稳定性提高了7.3 % ~ 34.1% %。它还显示出强大的可转移性，减少了高达63.7 %的再训练时间和高达62.7 %的交通延迟，同时在超速干扰和控制滞后条件下保持鲁棒控制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Physica A: Statistical Mechanics and its Applications 物理-物理：综合

CiteScore

7.20

自引率

9.10%

发文量

852

审稿时长

6.6 months

期刊介绍： Physica A: Statistical Mechanics and its Applications Recognized by the European Physical Society Physica A publishes research in the field of statistical mechanics and its applications. Statistical mechanics sets out to explain the behaviour of macroscopic systems by studying the statistical properties of their microscopic constituents. Applications of the techniques of statistical mechanics are widespread, and include: applications to physical systems such as solids, liquids and gases; applications to chemical and biological systems (colloids, interfaces, complex fluids, polymers and biopolymers, cell physics); and other interdisciplinary applications to for instance biological, economical and sociological systems.