Safe Reinforcement Learning for Autonomous Driving by Using Disturbance-Observer-Based Control Barrier Functions

IF 14.3 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Intelligent Vehicles Pub Date : 2024-09-20 DOI:10.1109/TIV.2024.3463468

Zhengyu Hou;Wenjun Liu;Alois Knoll

{"title":"Safe Reinforcement Learning for Autonomous Driving by Using Disturbance-Observer-Based Control Barrier Functions","authors":"Zhengyu Hou;Wenjun Liu;Alois Knoll","doi":"10.1109/TIV.2024.3463468","DOIUrl":null,"url":null,"abstract":"Recently, reinforcement learning (RL) has been increasingly used in autonomous driving (AD) navigation control systems. However, most RL-based AD navigation control systems remain in the simulation stage. Its practical application is limited due to growing safety concerns. The safety of these algorithms remains uncertain when confronted with real-world disturbances and vehicle model uncertainties. To enhance the safety of RL, we propose a disturbance observer (DOB) based safe soft actor-critic (SAC) algorithm that combines the SAC algorithm with a safety constraints filter composed of DOB and control barrier function (CBF). When the SAC agent's action output is unsafe, the safety constraints filter will alter it. We employ a DOB to accurately estimate the difference between the nominal model of the vehicle and the actual model, i.e., the lumped disturbances. Then, a more accurate vehicle model can be obtained. To ensure the safety of DOB-SAC under complex and dynamically changing environmental conditions, a further predictive safety constraint is defined based on model predictive control (MPC) ideas. The safe action will be rendered using safety-critical optimal control according to the DOB compensated vehicle model, CBF, and the predictive safety constraints. We discuss the SAC architecture and training details, and investigate the effectiveness of CBF in modeling safety constraints. Joint simulations are conducted in scenarios with static obstacles and intersection scenes with dynamic obstacles.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 6","pages":"3782-3791"},"PeriodicalIF":14.3000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Vehicles","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10684598/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, reinforcement learning (RL) has been increasingly used in autonomous driving (AD) navigation control systems. However, most RL-based AD navigation control systems remain in the simulation stage. Its practical application is limited due to growing safety concerns. The safety of these algorithms remains uncertain when confronted with real-world disturbances and vehicle model uncertainties. To enhance the safety of RL, we propose a disturbance observer (DOB) based safe soft actor-critic (SAC) algorithm that combines the SAC algorithm with a safety constraints filter composed of DOB and control barrier function (CBF). When the SAC agent's action output is unsafe, the safety constraints filter will alter it. We employ a DOB to accurately estimate the difference between the nominal model of the vehicle and the actual model, i.e., the lumped disturbances. Then, a more accurate vehicle model can be obtained. To ensure the safety of DOB-SAC under complex and dynamically changing environmental conditions, a further predictive safety constraint is defined based on model predictive control (MPC) ideas. The safe action will be rendered using safety-critical optimal control according to the DOB compensated vehicle model, CBF, and the predictive safety constraints. We discuss the SAC architecture and training details, and investigate the effectiveness of CBF in modeling safety constraints. Joint simulations are conducted in scenarios with static obstacles and intersection scenes with dynamic obstacles.

查看原文本刊更多论文

基于干扰观测器控制障碍函数的自动驾驶安全强化学习

近年来，强化学习（RL）在自动驾驶（AD）导航控制系统中的应用越来越广泛。然而，大多数基于rl的AD导航控制系统还停留在仿真阶段。由于越来越多的安全问题，它的实际应用受到限制。当面对现实世界的干扰和车辆模型的不确定性时，这些算法的安全性仍然不确定。为了提高RL的安全性，我们提出了一种基于扰动观测器（DOB）的安全软行为者评价（SAC）算法，该算法将SAC算法与由DOB和控制屏障函数（CBF）组成的安全约束滤波器相结合。当SAC代理的动作输出不安全时，安全约束过滤器将对其进行更改。我们使用DOB来准确地估计车辆的标称模型与实际模型之间的差异，即集总扰动。然后，可以得到更精确的车辆模型。为了保证DOB-SAC在复杂和动态变化的环境条件下的安全性，基于模型预测控制（MPC）思想，进一步定义了预测安全约束。根据DOB补偿车辆模型、CBF和预测安全约束，使用安全关键最优控制来呈现安全动作。我们讨论了SAC架构和训练细节，并研究了CBF在安全约束建模中的有效性。在静态障碍物场景和动态障碍物交叉场景下进行联合仿真。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Intelligent Vehicles Mathematics-Control and Optimization

CiteScore

12.10

自引率

13.40%

发文量

177

期刊介绍： The IEEE Transactions on Intelligent Vehicles (T-IV) is a premier platform for publishing peer-reviewed articles that present innovative research concepts, application results, significant theoretical findings, and application case studies in the field of intelligent vehicles. With a particular emphasis on automated vehicles within roadway environments, T-IV aims to raise awareness of pressing research and application challenges. Our focus is on providing critical information to the intelligent vehicle community, serving as a dissemination vehicle for IEEE ITS Society members and others interested in learning about the state-of-the-art developments and progress in research and applications related to intelligent vehicles. Join us in advancing knowledge and innovation in this dynamic field.