交通专业知识与残差 RL 的结合：基于知识模型的残差强化学习用于 CAV 轨迹控制

arXiv - CS - Artificial Intelligence Pub Date : 2024-08-30 DOI:arxiv-2408.17380

Zihao Sheng, Zilin Huang, Sikai Chen

{"title":"交通专业知识与残差 RL 的结合：基于知识模型的残差强化学习用于 CAV 轨迹控制","authors":"Zihao Sheng, Zilin Huang, Sikai Chen","doi":"arxiv-2408.17380","DOIUrl":null,"url":null,"abstract":"Model-based reinforcement learning (RL) is anticipated to exhibit higher\nsample efficiency compared to model-free RL by utilizing a virtual environment\nmodel. However, it is challenging to obtain sufficiently accurate\nrepresentations of the environmental dynamics due to uncertainties in complex\nsystems and environments. An inaccurate environment model may degrade the\nsample efficiency and performance of model-based RL. Furthermore, while\nmodel-based RL can improve sample efficiency, it often still requires\nsubstantial training time to learn from scratch, potentially limiting its\nadvantages over model-free approaches. To address these challenges, this paper\nintroduces a knowledge-informed model-based residual reinforcement learning\nframework aimed at enhancing learning efficiency by infusing established expert\nknowledge into the learning process and avoiding the issue of beginning from\nzero. Our approach integrates traffic expert knowledge into a virtual\nenvironment model, employing the Intelligent Driver Model (IDM) for basic\ndynamics and neural networks for residual dynamics, thus ensuring adaptability\nto complex scenarios. We propose a novel strategy that combines traditional\ncontrol methods with residual RL, facilitating efficient learning and policy\noptimization without the need to learn from scratch. The proposed approach is\napplied to CAV trajectory control tasks for the dissipation of stop-and-go\nwaves in mixed traffic flow. Experimental results demonstrate that our proposed\napproach enables the CAV agent to achieve superior performance in trajectory\ncontrol compared to the baseline agents in terms of sample efficiency, traffic\nflow smoothness and traffic mobility. The source code and supplementary\nmaterials are available at https://github.com/zihaosheng/traffic-expertise-RL/.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control\",\"authors\":\"Zihao Sheng, Zilin Huang, Sikai Chen\",\"doi\":\"arxiv-2408.17380\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Model-based reinforcement learning (RL) is anticipated to exhibit higher\\nsample efficiency compared to model-free RL by utilizing a virtual environment\\nmodel. However, it is challenging to obtain sufficiently accurate\\nrepresentations of the environmental dynamics due to uncertainties in complex\\nsystems and environments. An inaccurate environment model may degrade the\\nsample efficiency and performance of model-based RL. Furthermore, while\\nmodel-based RL can improve sample efficiency, it often still requires\\nsubstantial training time to learn from scratch, potentially limiting its\\nadvantages over model-free approaches. To address these challenges, this paper\\nintroduces a knowledge-informed model-based residual reinforcement learning\\nframework aimed at enhancing learning efficiency by infusing established expert\\nknowledge into the learning process and avoiding the issue of beginning from\\nzero. Our approach integrates traffic expert knowledge into a virtual\\nenvironment model, employing the Intelligent Driver Model (IDM) for basic\\ndynamics and neural networks for residual dynamics, thus ensuring adaptability\\nto complex scenarios. We propose a novel strategy that combines traditional\\ncontrol methods with residual RL, facilitating efficient learning and policy\\noptimization without the need to learn from scratch. The proposed approach is\\napplied to CAV trajectory control tasks for the dissipation of stop-and-go\\nwaves in mixed traffic flow. Experimental results demonstrate that our proposed\\napproach enables the CAV agent to achieve superior performance in trajectory\\ncontrol compared to the baseline agents in terms of sample efficiency, traffic\\nflow smoothness and traffic mobility. The source code and supplementary\\nmaterials are available at https://github.com/zihaosheng/traffic-expertise-RL/.\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.17380\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.17380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

与无模型强化学习（RL）相比，基于模型的强化学习（RL）通过利用虚拟环境模型，有望表现出更高的样本效率。然而，由于复杂系统和环境的不确定性，要获得足够准确的环境动态描述具有挑战性。不准确的环境模型可能会降低基于模型的 RL 的采样效率和性能。此外，虽然基于模型的 RL 可以提高采样效率，但它通常仍需要大量的训练时间来从头开始学习，这可能会限制它相对于无模型方法的优势。为了应对这些挑战，本文介绍了一种基于知识模型的残差强化学习框架，旨在通过将已有的专家知识注入学习过程来提高学习效率，避免从零开始的问题。我们的方法将交通专家知识集成到虚拟环境模型中，采用智能驾驶员模型（IDM）进行基本动力学分析，采用神经网络进行残差动力学分析，从而确保对复杂场景的适应性。我们提出了一种将传统控制方法与残差 RL 相结合的新策略，有助于高效学习和策略优化，而无需从头开始学习。我们将所提出的方法应用于 CAV 轨迹控制任务，以消除混合交通流中的停顿和波浪。实验结果表明，与基线代理相比，我们提出的方法使 CAV 代理在采样效率、交通流平稳性和交通流动性方面实现了更优越的轨迹控制性能。源代码和补充材料可在 https://github.com/zihaosheng/traffic-expertise-RL/ 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control

Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency compared to model-free RL by utilizing a virtual environment model. However, it is challenging to obtain sufficiently accurate representations of the environmental dynamics due to uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performance of model-based RL. Furthermore, while model-based RL can improve sample efficiency, it often still requires substantial training time to learn from scratch, potentially limiting its advantages over model-free approaches. To address these challenges, this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero. Our approach integrates traffic expert knowledge into a virtual environment model, employing the Intelligent Driver Model (IDM) for basic dynamics and neural networks for residual dynamics, thus ensuring adaptability to complex scenarios. We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch. The proposed approach is applied to CAV trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flow. Experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared to the baseline agents in terms of sample efficiency, traffic flow smoothness and traffic mobility. The source code and supplementary materials are available at https://github.com/zihaosheng/traffic-expertise-RL/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量