约束流形上的安全强化学习：理论与应用

IF 9.4 1区计算机科学 Q1 ROBOTICS

IEEE Transactions on Robotics Pub Date : 2025-03-06 DOI:10.1109/TRO.2025.3567477

Puze Liu;Haitham Bou-Ammar;Jan Peters;Davide Tateo

{"title":"约束流形上的安全强化学习：理论与应用","authors":"Puze Liu;Haitham Bou-Ammar;Jan Peters;Davide Tateo","doi":"10.1109/TRO.2025.3567477","DOIUrl":null,"url":null,"abstract":"Integrating learning-based techniques, especially reinforcement learning, into robotics is promising for solving complex problems in unstructured environments. Most of the existing approaches rely on training in carefully calibrated simulators before being deployed on real robots, often without real-world fine-tuning. While effective in controlled settings, this framework falls short in applications where precise simulation is unavailable or the environment is too complex to model. Instead, on-robot learning, which learns by interacting directly with the real world, offers a promising alternative. One major problem for on-robot reinforcement learning is ensuring safety, as uncontrolled exploration can cause catastrophic damage to the robot or the environment. Indeed, safety specifications, often represented as constraints, can be complex and nonlinear, making safety challenging to guarantee in learning systems. In this article, we show how we can impose complex safety constraints on learning-based robotics systems in a principled manner, both from theoretical and practical points of view. Our approach is based on the concept of the constraint manifold, representing the set of safe robot configurations. Exploiting differential geometry techniques, i.e., the tangent space, we can construct a safe action space, allowing learning agents to sample arbitrary actions while ensuring safety. We demonstrate the method's effectiveness in a real-world robot air hockey task, showing that our method can handle high-dimensional tasks with complex constraints.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"3442-3461"},"PeriodicalIF":9.4000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Safe Reinforcement Learning on the Constraint Manifold: Theory and Applications\",\"authors\":\"Puze Liu;Haitham Bou-Ammar;Jan Peters;Davide Tateo\",\"doi\":\"10.1109/TRO.2025.3567477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Integrating learning-based techniques, especially reinforcement learning, into robotics is promising for solving complex problems in unstructured environments. Most of the existing approaches rely on training in carefully calibrated simulators before being deployed on real robots, often without real-world fine-tuning. While effective in controlled settings, this framework falls short in applications where precise simulation is unavailable or the environment is too complex to model. Instead, on-robot learning, which learns by interacting directly with the real world, offers a promising alternative. One major problem for on-robot reinforcement learning is ensuring safety, as uncontrolled exploration can cause catastrophic damage to the robot or the environment. Indeed, safety specifications, often represented as constraints, can be complex and nonlinear, making safety challenging to guarantee in learning systems. In this article, we show how we can impose complex safety constraints on learning-based robotics systems in a principled manner, both from theoretical and practical points of view. Our approach is based on the concept of the constraint manifold, representing the set of safe robot configurations. Exploiting differential geometry techniques, i.e., the tangent space, we can construct a safe action space, allowing learning agents to sample arbitrary actions while ensuring safety. We demonstrate the method's effectiveness in a real-world robot air hockey task, showing that our method can handle high-dimensional tasks with complex constraints.\",\"PeriodicalId\":50388,\"journal\":{\"name\":\"IEEE Transactions on Robotics\",\"volume\":\"41 \",\"pages\":\"3442-3461\"},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2025-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10989544/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10989544/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

将基于学习的技术，特别是强化学习集成到机器人中，有望解决非结构化环境中的复杂问题。大多数现有的方法都依赖于在真正的机器人上部署之前在精心校准的模拟器中进行训练，通常没有现实世界的微调。虽然在受控环境中有效，但该框架在无法进行精确仿真或环境过于复杂而无法建模的应用中不足。相反，通过直接与现实世界互动来学习的机器人学习提供了一个很有前途的选择。机器人强化学习的一个主要问题是确保安全，因为不受控制的探索可能会对机器人或环境造成灾难性的破坏。事实上，安全规范（通常表示为约束）可能是复杂和非线性的，这使得学习系统的安全性难以保证。在本文中，我们从理论和实践的角度展示了如何以原则性的方式对基于学习的机器人系统施加复杂的安全约束。我们的方法是基于约束流形的概念，表示安全机器人配置的集合。利用微分几何技术，即切线空间，我们可以构建一个安全的动作空间，允许学习代理在确保安全的情况下对任意动作进行采样。我们在一个现实世界的机器人空气曲棍球任务中验证了该方法的有效性，表明我们的方法可以处理具有复杂约束的高维任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Safe Reinforcement Learning on the Constraint Manifold: Theory and Applications

Integrating learning-based techniques, especially reinforcement learning, into robotics is promising for solving complex problems in unstructured environments. Most of the existing approaches rely on training in carefully calibrated simulators before being deployed on real robots, often without real-world fine-tuning. While effective in controlled settings, this framework falls short in applications where precise simulation is unavailable or the environment is too complex to model. Instead, on-robot learning, which learns by interacting directly with the real world, offers a promising alternative. One major problem for on-robot reinforcement learning is ensuring safety, as uncontrolled exploration can cause catastrophic damage to the robot or the environment. Indeed, safety specifications, often represented as constraints, can be complex and nonlinear, making safety challenging to guarantee in learning systems. In this article, we show how we can impose complex safety constraints on learning-based robotics systems in a principled manner, both from theoretical and practical points of view. Our approach is based on the concept of the constraint manifold, representing the set of safe robot configurations. Exploiting differential geometry techniques, i.e., the tangent space, we can construct a safe action space, allowing learning agents to sample arbitrary actions while ensuring safety. We demonstrate the method's effectiveness in a real-world robot air hockey task, showing that our method can handle high-dimensional tasks with complex constraints.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Robotics 工程技术-机器人学

CiteScore

14.90

自引率

5.10%

发文量

259

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles. Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.