Safe reinforcement learning for high-speed autonomous racing

Cognitive Robotics Pub Date : 2023-01-01 DOI:10.1016/j.cogr.2023.04.002

Benjamin D. Evans, Hendrik W. Jordaan, Herman A. Engelbrecht

{"title":"Safe reinforcement learning for high-speed autonomous racing","authors":"Benjamin D. Evans, Hendrik W. Jordaan, Herman A. Engelbrecht","doi":"10.1016/j.cogr.2023.04.002","DOIUrl":null,"url":null,"abstract":"<div><p>The conventional application of deep reinforcement learning (DRL) to autonomous racing requires the agent to crash during training, thus limiting training to simulation environments. Further, many DRL approaches still exhibit high crash rates after training, making them infeasible for real-world use. This paper addresses the problem of safely training DRL agents for autonomous racing. Firstly, we present a Viability Theory-based supervisor that ensures the vehicle does not crash and remains within the friction limit while maintaining recursive feasibility. Secondly, we use the supervisor to ensure the vehicle does not crash during the training of DRL agents for high-speed racing. The evaluation in the open-source F1Tenth simulator demonstrates that our safety system can ensure the safety of a worst-case scenario planner on four test maps up to speeds of 6 m/s. Training agents to race with the supervisor significantly improves sample efficiency, requiring only 10,000 steps. Our learning formulation leads to learning more conservative, safer policies with slower lap times and a higher success rate, resulting in our method being feasible for physical vehicle racing. Enabling DRL agents to learn to race without ever crashing is a step towards using DRL on physical vehicles.</p></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"3 ","pages":"Pages 107-126"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241323000125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The conventional application of deep reinforcement learning (DRL) to autonomous racing requires the agent to crash during training, thus limiting training to simulation environments. Further, many DRL approaches still exhibit high crash rates after training, making them infeasible for real-world use. This paper addresses the problem of safely training DRL agents for autonomous racing. Firstly, we present a Viability Theory-based supervisor that ensures the vehicle does not crash and remains within the friction limit while maintaining recursive feasibility. Secondly, we use the supervisor to ensure the vehicle does not crash during the training of DRL agents for high-speed racing. The evaluation in the open-source F1Tenth simulator demonstrates that our safety system can ensure the safety of a worst-case scenario planner on four test maps up to speeds of 6 m/s. Training agents to race with the supervisor significantly improves sample efficiency, requiring only 10,000 steps. Our learning formulation leads to learning more conservative, safer policies with slower lap times and a higher success rate, resulting in our method being feasible for physical vehicle racing. Enabling DRL agents to learn to race without ever crashing is a step towards using DRL on physical vehicles.

查看原文本刊更多论文

高速自动驾驶赛车的安全强化学习

深度强化学习（DRL）在自主比赛中的传统应用要求代理在训练过程中崩溃，从而将训练限制在模拟环境中。此外，许多DRL方法在训练后仍然表现出高崩溃率，这使得它们在现实世界中不可行。本文解决了为自主赛车安全训练DRL代理的问题。首先，我们提出了一种基于可行性理论的监督器，该监督器确保车辆不会碰撞并保持在摩擦极限内，同时保持递归可行性。其次，我们使用监督员来确保车辆在DRL代理进行高速比赛训练时不会发生碰撞。开源F1Tenth模拟器中的评估表明，我们的安全系统可以确保最坏情况规划器在四张速度高达6 m/s的测试图上的安全。训练代理与主管比赛可以显著提高采样效率，只需要10000步。我们的学习公式可以学习更保守、更安全的策略，圈速更低，成功率更高，因此我们的方法适用于实体赛车。让DRL代理人学会在不发生碰撞的情况下比赛是在实体车辆上使用DRL的一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognitive Robotics

CiteScore

8.40

自引率

0.00%

发文量