Continuous Control of Autonomous Vehicles using Plan-assisted Deep Reinforcement Learning

2022 22nd International Conference on Control, Automation and Systems (ICCAS) Pub Date : 2022-11-27 DOI:10.23919/ICCAS55662.2022.10003698

Tanay Dwivedi, Tobias Betz, Florian Sauerbeck, P. Manivannan, M. Lienkamp

{"title":"Continuous Control of Autonomous Vehicles using Plan-assisted Deep Reinforcement Learning","authors":"Tanay Dwivedi, Tobias Betz, Florian Sauerbeck, P. Manivannan, M. Lienkamp","doi":"10.23919/ICCAS55662.2022.10003698","DOIUrl":null,"url":null,"abstract":"End-to-end deep reinforcement learning (DRL) is emerging as a promising paradigm for autonomous driving. Although DRL provides an elegant framework to accomplish final goals without extensive manual engineering, capturing plans and behavior using deep neural networks is still an unsolved issue. End-to-end architectures, as a result, are currently limited to simple driving scenarios, often performing sub-optimally when rare, unique conditions are encountered. We propose a novel plan-assisted deep reinforcement learning framework that, along with the typical state-space, leverages a “trajectory-space” to learn optimal control. While the trajectory-space, generated by an external planner, intrinsically captures the agent’s high-level plans, world models are used to understand the dynamics of the environment for learning behavior in latent space. An actor-critic network, trained in imagination, uses these latent features to predict policy and state-value function. Based primarily on DreamerV2 and Racing Dreamer, the proposed model is first trained in a simulator and eventually tested on the FITENTH race car. We evaluate our model for best lap times against parameter-tuned and learning-based controllers on unseen race tracks and demonstrate that it generalizes to complex scenarios where other approaches perform sub-optimally. Furthermore, we show the model’s enhanced stability as a trajectory tracker and establish the improvement in interpretability achieved by the proposed framework.","PeriodicalId":129856,"journal":{"name":"2022 22nd International Conference on Control, Automation and Systems (ICCAS)","volume":"46 14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 22nd International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS55662.2022.10003698","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

End-to-end deep reinforcement learning (DRL) is emerging as a promising paradigm for autonomous driving. Although DRL provides an elegant framework to accomplish final goals without extensive manual engineering, capturing plans and behavior using deep neural networks is still an unsolved issue. End-to-end architectures, as a result, are currently limited to simple driving scenarios, often performing sub-optimally when rare, unique conditions are encountered. We propose a novel plan-assisted deep reinforcement learning framework that, along with the typical state-space, leverages a “trajectory-space” to learn optimal control. While the trajectory-space, generated by an external planner, intrinsically captures the agent’s high-level plans, world models are used to understand the dynamics of the environment for learning behavior in latent space. An actor-critic network, trained in imagination, uses these latent features to predict policy and state-value function. Based primarily on DreamerV2 and Racing Dreamer, the proposed model is first trained in a simulator and eventually tested on the FITENTH race car. We evaluate our model for best lap times against parameter-tuned and learning-based controllers on unseen race tracks and demonstrate that it generalizes to complex scenarios where other approaches perform sub-optimally. Furthermore, we show the model’s enhanced stability as a trajectory tracker and establish the improvement in interpretability achieved by the proposed framework.

查看原文本刊更多论文

基于计划辅助深度强化学习的自动驾驶车辆连续控制

端到端深度强化学习(DRL)正在成为自动驾驶的一个有前途的范例。尽管DRL提供了一个优雅的框架来实现最终目标，而不需要大量的人工工程，但使用深度神经网络捕获计划和行为仍然是一个未解决的问题。因此，端到端架构目前仅限于简单的驾驶场景，在遇到罕见、独特的情况时，通常会执行次优性能。我们提出了一个新的计划辅助深度强化学习框架，与典型的状态空间一起，利用“轨迹空间”来学习最优控制。虽然由外部规划器生成的轨迹空间本质上捕获了智能体的高级计划，但世界模型用于理解潜在空间中学习行为的环境动态。经过想象力训练的演员-评论家网络使用这些潜在特征来预测政策和状态-价值函数。该模型主要基于dreamamerv2和赛车梦想家，首先在模拟器中进行训练，最后在第十五辆赛车上进行测试。我们在未见过的赛道上通过参数调整和基于学习的控制器来评估我们的模型，以获得最佳单圈时间，并证明它可以推广到其他方法执行次优的复杂场景。此外，我们展示了模型作为轨迹跟踪器的增强稳定性，并建立了该框架在可解释性方面的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 22nd International Conference on Control, Automation and Systems (ICCAS)

自引率

0.00%

发文量