Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

arXiv - CS - Machine Learning Pub Date : 2024-09-18 DOI:arxiv-2409.12045

Jonas Günster, Puze Liu, Jan Peters, Davide Tateo

{"title":"Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning","authors":"Jonas Günster, Puze Liu, Jan Peters, Davide Tateo","doi":"arxiv-2409.12045","DOIUrl":null,"url":null,"abstract":"Safety is one of the key issues preventing the deployment of reinforcement\nlearning techniques in real-world robots. While most approaches in the Safe\nReinforcement Learning area do not require prior knowledge of constraints and\nrobot kinematics and rely solely on data, it is often difficult to deploy them\nin complex real-world settings. Instead, model-based approaches that\nincorporate prior knowledge of the constraints and dynamics into the learning\nframework have proven capable of deploying the learning algorithm directly on\nthe real robot. Unfortunately, while an approximated model of the robot\ndynamics is often available, the safety constraints are task-specific and hard\nto obtain: they may be too complicated to encode analytically, too expensive to\ncompute, or it may be difficult to envision a priori the long-term safety\nrequirements. In this paper, we bridge this gap by extending the safe\nexploration method, ATACOM, with learnable constraints, with a particular focus\non ensuring long-term safety and handling of uncertainty. Our approach is\ncompetitive or superior to state-of-the-art methods in final performance while\nmaintaining safer behavior during training.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Safety is one of the key issues preventing the deployment of reinforcement learning techniques in real-world robots. While most approaches in the Safe Reinforcement Learning area do not require prior knowledge of constraints and robot kinematics and rely solely on data, it is often difficult to deploy them in complex real-world settings. Instead, model-based approaches that incorporate prior knowledge of the constraints and dynamics into the learning framework have proven capable of deploying the learning algorithm directly on the real robot. Unfortunately, while an approximated model of the robot dynamics is often available, the safety constraints are task-specific and hard to obtain: they may be too complicated to encode analytically, too expensive to compute, or it may be difficult to envision a priori the long-term safety requirements. In this paper, we bridge this gap by extending the safe exploration method, ATACOM, with learnable constraints, with a particular focus on ensuring long-term safety and handling of uncertainty. Our approach is competitive or superior to state-of-the-art methods in final performance while maintaining safer behavior during training.

查看原文本刊更多论文

在安全强化学习中处理长期安全性和不确定性

安全是阻碍在真实世界机器人中应用强化学习技术的关键问题之一。虽然安全强化学习领域的大多数方法都不需要事先了解约束条件和机器人运动学知识，而完全依赖于数据，但在复杂的真实世界环境中部署这些方法往往很困难。相反，基于模型的方法将约束条件和动力学的先验知识纳入学习框架，已被证明能够直接在真实机器人上部署学习算法。遗憾的是，虽然机器人动力学的近似模型通常可用，但安全约束条件是特定任务且难以获得的：它们可能过于复杂，难以分析编码，计算成本过高，或者难以预先设想长期安全要求。在本文中，我们利用可学习的约束条件扩展了安全探索方法 ATACOM，从而弥补了这一差距，特别是在确保长期安全和处理不确定性方面。我们的方法在最终性能上可与最先进的方法媲美或更胜一筹，同时在训练过程中保持更安全的行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Machine Learning

自引率

0.00%

发文量