基于干扰观测器控制障碍函数的自动驾驶安全强化学习

IF 14.3 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhengyu Hou;Wenjun Liu;Alois Knoll
{"title":"基于干扰观测器控制障碍函数的自动驾驶安全强化学习","authors":"Zhengyu Hou;Wenjun Liu;Alois Knoll","doi":"10.1109/TIV.2024.3463468","DOIUrl":null,"url":null,"abstract":"Recently, reinforcement learning (RL) has been increasingly used in autonomous driving (AD) navigation control systems. However, most RL-based AD navigation control systems remain in the simulation stage. Its practical application is limited due to growing safety concerns. The safety of these algorithms remains uncertain when confronted with real-world disturbances and vehicle model uncertainties. To enhance the safety of RL, we propose a disturbance observer (DOB) based safe soft actor-critic (SAC) algorithm that combines the SAC algorithm with a safety constraints filter composed of DOB and control barrier function (CBF). When the SAC agent's action output is unsafe, the safety constraints filter will alter it. We employ a DOB to accurately estimate the difference between the nominal model of the vehicle and the actual model, i.e., the lumped disturbances. Then, a more accurate vehicle model can be obtained. To ensure the safety of DOB-SAC under complex and dynamically changing environmental conditions, a further predictive safety constraint is defined based on model predictive control (MPC) ideas. The safe action will be rendered using safety-critical optimal control according to the DOB compensated vehicle model, CBF, and the predictive safety constraints. We discuss the SAC architecture and training details, and investigate the effectiveness of CBF in modeling safety constraints. Joint simulations are conducted in scenarios with static obstacles and intersection scenes with dynamic obstacles.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 6","pages":"3782-3791"},"PeriodicalIF":14.3000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Safe Reinforcement Learning for Autonomous Driving by Using Disturbance-Observer-Based Control Barrier Functions\",\"authors\":\"Zhengyu Hou;Wenjun Liu;Alois Knoll\",\"doi\":\"10.1109/TIV.2024.3463468\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, reinforcement learning (RL) has been increasingly used in autonomous driving (AD) navigation control systems. However, most RL-based AD navigation control systems remain in the simulation stage. Its practical application is limited due to growing safety concerns. The safety of these algorithms remains uncertain when confronted with real-world disturbances and vehicle model uncertainties. To enhance the safety of RL, we propose a disturbance observer (DOB) based safe soft actor-critic (SAC) algorithm that combines the SAC algorithm with a safety constraints filter composed of DOB and control barrier function (CBF). When the SAC agent's action output is unsafe, the safety constraints filter will alter it. We employ a DOB to accurately estimate the difference between the nominal model of the vehicle and the actual model, i.e., the lumped disturbances. Then, a more accurate vehicle model can be obtained. To ensure the safety of DOB-SAC under complex and dynamically changing environmental conditions, a further predictive safety constraint is defined based on model predictive control (MPC) ideas. The safe action will be rendered using safety-critical optimal control according to the DOB compensated vehicle model, CBF, and the predictive safety constraints. We discuss the SAC architecture and training details, and investigate the effectiveness of CBF in modeling safety constraints. Joint simulations are conducted in scenarios with static obstacles and intersection scenes with dynamic obstacles.\",\"PeriodicalId\":36532,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Vehicles\",\"volume\":\"10 6\",\"pages\":\"3782-3791\"},\"PeriodicalIF\":14.3000,\"publicationDate\":\"2024-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Vehicles\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10684598/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Vehicles","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10684598/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

近年来,强化学习(RL)在自动驾驶(AD)导航控制系统中的应用越来越广泛。然而,大多数基于rl的AD导航控制系统还停留在仿真阶段。由于越来越多的安全问题,它的实际应用受到限制。当面对现实世界的干扰和车辆模型的不确定性时,这些算法的安全性仍然不确定。为了提高RL的安全性,我们提出了一种基于扰动观测器(DOB)的安全软行为者评价(SAC)算法,该算法将SAC算法与由DOB和控制屏障函数(CBF)组成的安全约束滤波器相结合。当SAC代理的动作输出不安全时,安全约束过滤器将对其进行更改。我们使用DOB来准确地估计车辆的标称模型与实际模型之间的差异,即集总扰动。然后,可以得到更精确的车辆模型。为了保证DOB-SAC在复杂和动态变化的环境条件下的安全性,基于模型预测控制(MPC)思想,进一步定义了预测安全约束。根据DOB补偿车辆模型、CBF和预测安全约束,使用安全关键最优控制来呈现安全动作。我们讨论了SAC架构和训练细节,并研究了CBF在安全约束建模中的有效性。在静态障碍物场景和动态障碍物交叉场景下进行联合仿真。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Safe Reinforcement Learning for Autonomous Driving by Using Disturbance-Observer-Based Control Barrier Functions
Recently, reinforcement learning (RL) has been increasingly used in autonomous driving (AD) navigation control systems. However, most RL-based AD navigation control systems remain in the simulation stage. Its practical application is limited due to growing safety concerns. The safety of these algorithms remains uncertain when confronted with real-world disturbances and vehicle model uncertainties. To enhance the safety of RL, we propose a disturbance observer (DOB) based safe soft actor-critic (SAC) algorithm that combines the SAC algorithm with a safety constraints filter composed of DOB and control barrier function (CBF). When the SAC agent's action output is unsafe, the safety constraints filter will alter it. We employ a DOB to accurately estimate the difference between the nominal model of the vehicle and the actual model, i.e., the lumped disturbances. Then, a more accurate vehicle model can be obtained. To ensure the safety of DOB-SAC under complex and dynamically changing environmental conditions, a further predictive safety constraint is defined based on model predictive control (MPC) ideas. The safe action will be rendered using safety-critical optimal control according to the DOB compensated vehicle model, CBF, and the predictive safety constraints. We discuss the SAC architecture and training details, and investigate the effectiveness of CBF in modeling safety constraints. Joint simulations are conducted in scenarios with static obstacles and intersection scenes with dynamic obstacles.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Intelligent Vehicles
IEEE Transactions on Intelligent Vehicles Mathematics-Control and Optimization
CiteScore
12.10
自引率
13.40%
发文量
177
期刊介绍: The IEEE Transactions on Intelligent Vehicles (T-IV) is a premier platform for publishing peer-reviewed articles that present innovative research concepts, application results, significant theoretical findings, and application case studies in the field of intelligent vehicles. With a particular emphasis on automated vehicles within roadway environments, T-IV aims to raise awareness of pressing research and application challenges. Our focus is on providing critical information to the intelligent vehicle community, serving as a dissemination vehicle for IEEE ITS Society members and others interested in learning about the state-of-the-art developments and progress in research and applications related to intelligent vehicles. Join us in advancing knowledge and innovation in this dynamic field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信