Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control

IF 14.5 Q1 TRANSPORTATION

Communications in Transportation Research Pub Date : 2024-10-18 DOI:10.1016/j.commtr.2024.100142

Zihao Sheng, Zilin Huang, Sikai Chen

{"title":"Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control","authors":"Zihao Sheng, Zilin Huang, Sikai Chen","doi":"10.1016/j.commtr.2024.100142","DOIUrl":null,"url":null,"abstract":"<div><div>Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency than model-free RL by utilizing a virtual environment model. However, obtaining sufficiently accurate representations of environmental dynamics is challenging because of uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performance of model-based RL. Furthermore, while model-based RL can improve sample efficiency, it often still requires substantial training time to learn from scratch, potentially limiting its advantages over model-free approaches. To address these challenges, this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero. Our approach integrates traffic expert knowledge into a virtual environment model, employing the intelligent driver model (IDM) for basic dynamics and neural networks for residual dynamics, thus ensuring adaptability to complex scenarios. We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch. The proposed approach is applied to connected automated vehicle (CAV) trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flows. The experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared with the baseline agents in terms of sample efficiency, traffic flow smoothness and traffic mobility.</div></div>","PeriodicalId":100292,"journal":{"name":"Communications in Transportation Research","volume":"4 ","pages":"Article 100142"},"PeriodicalIF":14.5000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications in Transportation Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772424724000258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION","Score":null,"Total":0}

引用次数: 0

Abstract

Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency than model-free RL by utilizing a virtual environment model. However, obtaining sufficiently accurate representations of environmental dynamics is challenging because of uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performance of model-based RL. Furthermore, while model-based RL can improve sample efficiency, it often still requires substantial training time to learn from scratch, potentially limiting its advantages over model-free approaches. To address these challenges, this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero. Our approach integrates traffic expert knowledge into a virtual environment model, employing the intelligent driver model (IDM) for basic dynamics and neural networks for residual dynamics, thus ensuring adaptability to complex scenarios. We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch. The proposed approach is applied to connected automated vehicle (CAV) trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flows. The experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared with the baseline agents in terms of sample efficiency, traffic flow smoothness and traffic mobility.

查看原文本刊更多论文

交通专业知识与残差 RL 的结合：基于知识模型的残差强化学习用于 CAV 轨迹控制

通过利用虚拟环境模型，基于模型的强化学习（RL）有望比无模型强化学习表现出更高的采样效率。然而，由于复杂系统和环境中的不确定性，获得足够准确的环境动态表征具有挑战性。不准确的环境模型可能会降低基于模型的 RL 的采样效率和性能。此外，虽然基于模型的 RL 可以提高采样效率，但它通常仍需要大量的训练时间来从头开始学习，这可能会限制它相对于无模型方法的优势。为了应对这些挑战，本文介绍了一种基于知识的模型残差强化学习框架，旨在通过在学习过程中注入已有的专家知识来提高学习效率，避免从零开始的问题。我们的方法将交通专家知识整合到虚拟环境模型中，采用智能驾驶员模型（IDM）来处理基本动态，采用神经网络来处理残差动态，从而确保对复杂场景的适应性。我们提出了一种新颖的策略，将传统控制方法与残差 RL 相结合，促进高效学习和策略优化，而无需从头开始学习。我们将所提出的方法应用于联网自动驾驶汽车（CAV）的轨迹控制任务，以消除混合交通流中的走走停停现象。实验结果表明，与基线代理相比，我们提出的方法使 CAV 代理在样本效率、交通流平稳性和交通流动性方面实现了更优越的轨迹控制性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Communications in Transportation Research

CiteScore

15.20

自引率

0.00%

发文量