{"title":"Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning","authors":"Jonas Günster, Puze Liu, Jan Peters, Davide Tateo","doi":"arxiv-2409.12045","DOIUrl":null,"url":null,"abstract":"Safety is one of the key issues preventing the deployment of reinforcement\nlearning techniques in real-world robots. While most approaches in the Safe\nReinforcement Learning area do not require prior knowledge of constraints and\nrobot kinematics and rely solely on data, it is often difficult to deploy them\nin complex real-world settings. Instead, model-based approaches that\nincorporate prior knowledge of the constraints and dynamics into the learning\nframework have proven capable of deploying the learning algorithm directly on\nthe real robot. Unfortunately, while an approximated model of the robot\ndynamics is often available, the safety constraints are task-specific and hard\nto obtain: they may be too complicated to encode analytically, too expensive to\ncompute, or it may be difficult to envision a priori the long-term safety\nrequirements. In this paper, we bridge this gap by extending the safe\nexploration method, ATACOM, with learnable constraints, with a particular focus\non ensuring long-term safety and handling of uncertainty. Our approach is\ncompetitive or superior to state-of-the-art methods in final performance while\nmaintaining safer behavior during training.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Safety is one of the key issues preventing the deployment of reinforcement
learning techniques in real-world robots. While most approaches in the Safe
Reinforcement Learning area do not require prior knowledge of constraints and
robot kinematics and rely solely on data, it is often difficult to deploy them
in complex real-world settings. Instead, model-based approaches that
incorporate prior knowledge of the constraints and dynamics into the learning
framework have proven capable of deploying the learning algorithm directly on
the real robot. Unfortunately, while an approximated model of the robot
dynamics is often available, the safety constraints are task-specific and hard
to obtain: they may be too complicated to encode analytically, too expensive to
compute, or it may be difficult to envision a priori the long-term safety
requirements. In this paper, we bridge this gap by extending the safe
exploration method, ATACOM, with learnable constraints, with a particular focus
on ensuring long-term safety and handling of uncertainty. Our approach is
competitive or superior to state-of-the-art methods in final performance while
maintaining safer behavior during training.