Continuous Control of Autonomous Vehicles using Plan-assisted Deep Reinforcement Learning

Tanay Dwivedi, Tobias Betz, Florian Sauerbeck, P. Manivannan, M. Lienkamp
{"title":"Continuous Control of Autonomous Vehicles using Plan-assisted Deep Reinforcement Learning","authors":"Tanay Dwivedi, Tobias Betz, Florian Sauerbeck, P. Manivannan, M. Lienkamp","doi":"10.23919/ICCAS55662.2022.10003698","DOIUrl":null,"url":null,"abstract":"End-to-end deep reinforcement learning (DRL) is emerging as a promising paradigm for autonomous driving. Although DRL provides an elegant framework to accomplish final goals without extensive manual engineering, capturing plans and behavior using deep neural networks is still an unsolved issue. End-to-end architectures, as a result, are currently limited to simple driving scenarios, often performing sub-optimally when rare, unique conditions are encountered. We propose a novel plan-assisted deep reinforcement learning framework that, along with the typical state-space, leverages a “trajectory-space” to learn optimal control. While the trajectory-space, generated by an external planner, intrinsically captures the agent’s high-level plans, world models are used to understand the dynamics of the environment for learning behavior in latent space. An actor-critic network, trained in imagination, uses these latent features to predict policy and state-value function. Based primarily on DreamerV2 and Racing Dreamer, the proposed model is first trained in a simulator and eventually tested on the FITENTH race car. We evaluate our model for best lap times against parameter-tuned and learning-based controllers on unseen race tracks and demonstrate that it generalizes to complex scenarios where other approaches perform sub-optimally. Furthermore, we show the model’s enhanced stability as a trajectory tracker and establish the improvement in interpretability achieved by the proposed framework.","PeriodicalId":129856,"journal":{"name":"2022 22nd International Conference on Control, Automation and Systems (ICCAS)","volume":"46 14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 22nd International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS55662.2022.10003698","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

End-to-end deep reinforcement learning (DRL) is emerging as a promising paradigm for autonomous driving. Although DRL provides an elegant framework to accomplish final goals without extensive manual engineering, capturing plans and behavior using deep neural networks is still an unsolved issue. End-to-end architectures, as a result, are currently limited to simple driving scenarios, often performing sub-optimally when rare, unique conditions are encountered. We propose a novel plan-assisted deep reinforcement learning framework that, along with the typical state-space, leverages a “trajectory-space” to learn optimal control. While the trajectory-space, generated by an external planner, intrinsically captures the agent’s high-level plans, world models are used to understand the dynamics of the environment for learning behavior in latent space. An actor-critic network, trained in imagination, uses these latent features to predict policy and state-value function. Based primarily on DreamerV2 and Racing Dreamer, the proposed model is first trained in a simulator and eventually tested on the FITENTH race car. We evaluate our model for best lap times against parameter-tuned and learning-based controllers on unseen race tracks and demonstrate that it generalizes to complex scenarios where other approaches perform sub-optimally. Furthermore, we show the model’s enhanced stability as a trajectory tracker and establish the improvement in interpretability achieved by the proposed framework.
基于计划辅助深度强化学习的自动驾驶车辆连续控制
端到端深度强化学习(DRL)正在成为自动驾驶的一个有前途的范例。尽管DRL提供了一个优雅的框架来实现最终目标,而不需要大量的人工工程,但使用深度神经网络捕获计划和行为仍然是一个未解决的问题。因此,端到端架构目前仅限于简单的驾驶场景,在遇到罕见、独特的情况时,通常会执行次优性能。我们提出了一个新的计划辅助深度强化学习框架,与典型的状态空间一起,利用“轨迹空间”来学习最优控制。虽然由外部规划器生成的轨迹空间本质上捕获了智能体的高级计划,但世界模型用于理解潜在空间中学习行为的环境动态。经过想象力训练的演员-评论家网络使用这些潜在特征来预测政策和状态-价值函数。该模型主要基于dreamamerv2和赛车梦想家,首先在模拟器中进行训练,最后在第十五辆赛车上进行测试。我们在未见过的赛道上通过参数调整和基于学习的控制器来评估我们的模型,以获得最佳单圈时间,并证明它可以推广到其他方法执行次优的复杂场景。此外,我们展示了模型作为轨迹跟踪器的增强稳定性,并建立了该框架在可解释性方面的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信