Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey

IEEE open journal of control systems Pub Date : 2024-08-23 DOI:10.1109/OJCSYS.2024.3449138

Milan Ganai;Sicun Gao;Sylvia L. Herbert

{"title":"Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey","authors":"Milan Ganai;Sicun Gao;Sylvia L. Herbert","doi":"10.1109/OJCSYS.2024.3449138","DOIUrl":null,"url":null,"abstract":"Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was restricted to verifying low-dimensional dynamical systems primarily because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. In recent years, a litany of proposed methods addresses this limitation by computing the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"310-324"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10645063","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10645063/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was restricted to verifying low-dimensional dynamical systems primarily because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. In recent years, a litany of proposed methods addresses this limitation by computing the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.

查看原文本刊更多论文

强化学习中的汉密尔顿-雅各比可达性：调查

最近有文献提出了既能学习高性能控制策略，又能保证安全的方法。合成汉密尔顿-雅各比（HJ）可达集已成为验证复杂高维系统安全性和监督基于强化学习的控制策略训练的有效工具。以前，HJ可达性仅限于验证低维动态系统，主要是因为它所依赖的动态编程方法的计算复杂度会随着系统状态数的增加而呈指数增长。近年来，为解决这一局限性，人们提出了一系列方法，即在计算可达性值函数的同时学习控制策略，以扩展 HJ可达性分析，同时仍能保持对真实可达集的可靠估计。这些 HJ 可及性近似值可用于提高所学控制策略的安全性甚至奖励性能，并能解决具有挑战性的任务，如具有动态障碍物和/或基于激光雷达或视觉观测的任务。在这篇调查报告中，我们回顾了强化学习中 HJ可达性估计领域的最新进展，这些进展将为进一步研究高维系统的可靠性奠定基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE open journal of control systems

自引率

0.00%

发文量