{"title":"受约束的 Dirichlet 分布策略:保证零违反约束的连续机器人控制强化学习","authors":"Jianming Ma;Zhanxiang Cao;Yue Gao","doi":"10.1109/LRA.2024.3490392","DOIUrl":null,"url":null,"abstract":"Learning-based controllers show promising performances in robotic control tasks. However, they still present potential safety risks due to the difficulty in ensuring satisfaction of complex action constraints. We propose a novel action-constrained reinforcement learning method, which transforms the constrained action space into its dual space and uses Dirichlet distribution policy to guarantee strict constraint satisfaction as well as randomized exploration. We validate the proposed method in benchmark environments and in a real quadruped locomotion task. Our method outperforms other baselines with higher reward and faster inference speed. Results of the real robot experiments demonstrate the effectiveness and potential application of our method.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11690-11697"},"PeriodicalIF":4.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Constrained Dirichlet Distribution Policy: Guarantee Zero Constraint Violation Reinforcement Learning for Continuous Robotic Control\",\"authors\":\"Jianming Ma;Zhanxiang Cao;Yue Gao\",\"doi\":\"10.1109/LRA.2024.3490392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning-based controllers show promising performances in robotic control tasks. However, they still present potential safety risks due to the difficulty in ensuring satisfaction of complex action constraints. We propose a novel action-constrained reinforcement learning method, which transforms the constrained action space into its dual space and uses Dirichlet distribution policy to guarantee strict constraint satisfaction as well as randomized exploration. We validate the proposed method in benchmark environments and in a real quadruped locomotion task. Our method outperforms other baselines with higher reward and faster inference speed. Results of the real robot experiments demonstrate the effectiveness and potential application of our method.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"9 12\",\"pages\":\"11690-11697\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10740920/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10740920/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Constrained Dirichlet Distribution Policy: Guarantee Zero Constraint Violation Reinforcement Learning for Continuous Robotic Control
Learning-based controllers show promising performances in robotic control tasks. However, they still present potential safety risks due to the difficulty in ensuring satisfaction of complex action constraints. We propose a novel action-constrained reinforcement learning method, which transforms the constrained action space into its dual space and uses Dirichlet distribution policy to guarantee strict constraint satisfaction as well as randomized exploration. We validate the proposed method in benchmark environments and in a real quadruped locomotion task. Our method outperforms other baselines with higher reward and faster inference speed. Results of the real robot experiments demonstrate the effectiveness and potential application of our method.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.