结合多代理深度确定性策略梯度和重路由技术，提高混合交通条件下的交通网络性能

SIMULATION Pub Date : 2024-03-22 DOI:10.1177/00375497241237831

Hung Tuan Trinh, Sang-Hoon Bae, Duy Quang Tran

{"title":"结合多代理深度确定性策略梯度和重路由技术，提高混合交通条件下的交通网络性能","authors":"Hung Tuan Trinh, Sang-Hoon Bae, Duy Quang Tran","doi":"10.1177/00375497241237831","DOIUrl":null,"url":null,"abstract":"In the future, mixed traffic flow will include two types of vehicles: connected autonomous vehicles (CAVs) and human-driven vehicles (HDVs). CAVs emerge as new solutions to disrupt the traditional transportation system. This new solution shares real-time data with each other and the roadside units (RSU) for network management. Reinforcement learning (RL) is a promising approach for traffic signal management in complex urban areas by leveraging information gathered from CAVs. In particular, coordinating signal management at many intersections is a critical challenge in multi-agent reinforcement learning (MARL). According to this vision, we propose an approach that combines an actor–critic network–based multi-agent deep deterministic policy gradient (MADDPG) model and a rerouting technique (RT) to increase traffic performance in vehicular networks. This algorithm overcomes the inherent non-stationary of Q-learning and the high variance of policy gradient (PG) algorithms. Based on centralized learning with decentralized execution, the MADDPG model employs one actor and one critic for each agent. The actor network uses local information to execute actions, while the critic network is trained with extra information, including the states and actions of other agents. Through a centralized learning process, agents can coordinate with each other, diminishing the influence of an unstable environment. Unlike previous studies, we not only manage traffic light systems but also consider the effect of platooning vehicles on increasing throughput. Experimental results show that our model outperforms other models in terms of traffic performance in different scenarios.","PeriodicalId":501452,"journal":{"name":"SIMULATION","volume":"364 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combining multi-agent deep deterministic policy gradient and rerouting technique to improve traffic network performance under mixed traffic conditions\",\"authors\":\"Hung Tuan Trinh, Sang-Hoon Bae, Duy Quang Tran\",\"doi\":\"10.1177/00375497241237831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the future, mixed traffic flow will include two types of vehicles: connected autonomous vehicles (CAVs) and human-driven vehicles (HDVs). CAVs emerge as new solutions to disrupt the traditional transportation system. This new solution shares real-time data with each other and the roadside units (RSU) for network management. Reinforcement learning (RL) is a promising approach for traffic signal management in complex urban areas by leveraging information gathered from CAVs. In particular, coordinating signal management at many intersections is a critical challenge in multi-agent reinforcement learning (MARL). According to this vision, we propose an approach that combines an actor–critic network–based multi-agent deep deterministic policy gradient (MADDPG) model and a rerouting technique (RT) to increase traffic performance in vehicular networks. This algorithm overcomes the inherent non-stationary of Q-learning and the high variance of policy gradient (PG) algorithms. Based on centralized learning with decentralized execution, the MADDPG model employs one actor and one critic for each agent. The actor network uses local information to execute actions, while the critic network is trained with extra information, including the states and actions of other agents. Through a centralized learning process, agents can coordinate with each other, diminishing the influence of an unstable environment. Unlike previous studies, we not only manage traffic light systems but also consider the effect of platooning vehicles on increasing throughput. Experimental results show that our model outperforms other models in terms of traffic performance in different scenarios.\",\"PeriodicalId\":501452,\"journal\":{\"name\":\"SIMULATION\",\"volume\":\"364 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIMULATION\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/00375497241237831\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIMULATION","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/00375497241237831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

未来，混合交通流将包括两类车辆：联网自动驾驶车辆（CAV）和人类驾驶车辆（HDV）。CAV 作为新的解决方案出现，颠覆了传统的交通系统。这种新解决方案可相互共享实时数据，并与路边装置（RSU）共享数据，以进行网络管理。利用从 CAV 收集到的信息，强化学习（RL）是在复杂城市地区进行交通信号管理的一种有前途的方法。特别是，协调多个交叉路口的信号管理是多代理强化学习（MARL）的一个关键挑战。根据这一愿景，我们提出了一种将基于行为批判网络的多代理深度确定性策略梯度（MADDPG）模型和重路由技术（RT）相结合的方法，以提高车辆网络的交通性能。该算法克服了 Q-learning 固有的非平稳性和策略梯度 (PG) 算法的高方差。基于集中学习和分散执行，MADDPG 模型为每个代理采用一个代理和一个批评者。行动者网络利用本地信息执行行动，而批评者网络则利用额外信息（包括其他代理的状态和行动）进行训练。通过集中学习过程，代理可以相互协调，从而减少不稳定环境的影响。与以往研究不同的是，我们不仅管理交通灯系统，还考虑了排车对提高吞吐量的影响。实验结果表明，在不同场景下，我们的模型在交通性能方面优于其他模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Combining multi-agent deep deterministic policy gradient and rerouting technique to improve traffic network performance under mixed traffic conditions

In the future, mixed traffic flow will include two types of vehicles: connected autonomous vehicles (CAVs) and human-driven vehicles (HDVs). CAVs emerge as new solutions to disrupt the traditional transportation system. This new solution shares real-time data with each other and the roadside units (RSU) for network management. Reinforcement learning (RL) is a promising approach for traffic signal management in complex urban areas by leveraging information gathered from CAVs. In particular, coordinating signal management at many intersections is a critical challenge in multi-agent reinforcement learning (MARL). According to this vision, we propose an approach that combines an actor–critic network–based multi-agent deep deterministic policy gradient (MADDPG) model and a rerouting technique (RT) to increase traffic performance in vehicular networks. This algorithm overcomes the inherent non-stationary of Q-learning and the high variance of policy gradient (PG) algorithms. Based on centralized learning with decentralized execution, the MADDPG model employs one actor and one critic for each agent. The actor network uses local information to execute actions, while the critic network is trained with extra information, including the states and actions of other agents. Through a centralized learning process, agents can coordinate with each other, diminishing the influence of an unstable environment. Unlike previous studies, we not only manage traffic light systems but also consider the effect of platooning vehicles on increasing throughput. Experimental results show that our model outperforms other models in terms of traffic performance in different scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SIMULATION

自引率

0.00%

发文量