一种用于连续软机器人的演员-评论家运动强化学习方法

IF 2.9 Q2 ROBOTICS

Robotics Pub Date : 2023-10-09 DOI:10.3390/robotics12050141

Luis Pantoja-Garcia, Vicente Parra-Vega, Rodolfo Garcia-Rodriguez, Carlos Ernesto Vázquez-García

{"title":"一种用于连续软机器人的演员-评论家运动强化学习方法","authors":"Luis Pantoja-Garcia, Vicente Parra-Vega, Rodolfo Garcia-Rodriguez, Carlos Ernesto Vázquez-García","doi":"10.3390/robotics12050141","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is explored for motor control of a novel pneumatic-driven soft robot modeled after continuum media with a varying density. This model complies with closed-form Lagrangian dynamics, which fulfills the fundamental structural property of passivity, among others. Then, the question arises of how to synthesize a passivity-based RL model to control the unknown continuum soft robot dynamics to exploit its input–output energy properties advantageously throughout a reward-based neural network controller. Thus, we propose a continuous-time Actor–Critic scheme for tracking tasks of the continuum 3D soft robot subject to Lipschitz disturbances. A reward-based temporal difference leads to learning with a novel discontinuous adaptive mechanism of Critic neural weights. Finally, the reward and integral of the Bellman error approximation reinforce the adaptive mechanism of Actor neural weights. Closed-loop stability is guaranteed in the sense of Lyapunov, which leads to local exponential convergence of tracking errors based on integral sliding modes. Notably, it is assumed that dynamics are unknown, yet the control is continuous and robust. A representative simulation study shows the effectiveness of our proposal for tracking tasks.","PeriodicalId":37568,"journal":{"name":"Robotics","volume":"51 1","pages":"0"},"PeriodicalIF":2.9000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Novel Actor—Critic Motor Reinforcement Learning for Continuum Soft Robots\",\"authors\":\"Luis Pantoja-Garcia, Vicente Parra-Vega, Rodolfo Garcia-Rodriguez, Carlos Ernesto Vázquez-García\",\"doi\":\"10.3390/robotics12050141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) is explored for motor control of a novel pneumatic-driven soft robot modeled after continuum media with a varying density. This model complies with closed-form Lagrangian dynamics, which fulfills the fundamental structural property of passivity, among others. Then, the question arises of how to synthesize a passivity-based RL model to control the unknown continuum soft robot dynamics to exploit its input–output energy properties advantageously throughout a reward-based neural network controller. Thus, we propose a continuous-time Actor–Critic scheme for tracking tasks of the continuum 3D soft robot subject to Lipschitz disturbances. A reward-based temporal difference leads to learning with a novel discontinuous adaptive mechanism of Critic neural weights. Finally, the reward and integral of the Bellman error approximation reinforce the adaptive mechanism of Actor neural weights. Closed-loop stability is guaranteed in the sense of Lyapunov, which leads to local exponential convergence of tracking errors based on integral sliding modes. Notably, it is assumed that dynamics are unknown, yet the control is continuous and robust. A representative simulation study shows the effectiveness of our proposal for tracking tasks.\",\"PeriodicalId\":37568,\"journal\":{\"name\":\"Robotics\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/robotics12050141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/robotics12050141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 1

摘要

研究了一种基于变密度连续介质模型的新型气动软机器人的强化学习控制方法。该模型符合闭型拉格朗日动力学，满足无源性等基本结构性质。然后，如何综合基于被动的强化学习模型来控制未知连续体软机器人动力学，从而在基于奖励的神经网络控制器中有利地利用其输入输出能量特性。因此，我们提出了一种连续时间Actor-Critic方案，用于连续三维软机器人在Lipschitz扰动下的跟踪任务。基于奖励的时间差异导致了一种新的批评性神经权重的不连续适应机制。最后，Bellman误差逼近的奖励和积分强化了Actor神经权值的自适应机制。在Lyapunov意义下保证闭环的稳定性，使得基于积分滑模的跟踪误差的局部指数收敛。值得注意的是，它假设动力学是未知的，但控制是连续的和鲁棒的。一个有代表性的仿真研究表明了我们提出的跟踪任务的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Novel Actor—Critic Motor Reinforcement Learning for Continuum Soft Robots

Reinforcement learning (RL) is explored for motor control of a novel pneumatic-driven soft robot modeled after continuum media with a varying density. This model complies with closed-form Lagrangian dynamics, which fulfills the fundamental structural property of passivity, among others. Then, the question arises of how to synthesize a passivity-based RL model to control the unknown continuum soft robot dynamics to exploit its input–output energy properties advantageously throughout a reward-based neural network controller. Thus, we propose a continuous-time Actor–Critic scheme for tracking tasks of the continuum 3D soft robot subject to Lipschitz disturbances. A reward-based temporal difference leads to learning with a novel discontinuous adaptive mechanism of Critic neural weights. Finally, the reward and integral of the Bellman error approximation reinforce the adaptive mechanism of Actor neural weights. Closed-loop stability is guaranteed in the sense of Lyapunov, which leads to local exponential convergence of tracking errors based on integral sliding modes. Notably, it is assumed that dynamics are unknown, yet the control is continuous and robust. A representative simulation study shows the effectiveness of our proposal for tracking tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics Mathematics-Control and Optimization

CiteScore

6.70

自引率

8.10%

发文量

114

审稿时长

11 weeks

期刊介绍： Robotics publishes original papers, technical reports, case studies, review papers and tutorials in all the aspects of robotics. Special Issues devoted to important topics in advanced robotics will be published from time to time. It particularly welcomes those emerging methodologies and techniques which bridge theoretical studies and applications and have significant potential for real-world applications. It provides a forum for information exchange between professionals, academicians and engineers who are working in the area of robotics, helping them to disseminate research findings and to learn from each other’s work. Suitable topics include, but are not limited to: -intelligent robotics, mechatronics, and biomimetics -novel and biologically-inspired robotics -modelling, identification and control of robotic systems -biomedical, rehabilitation and surgical robotics -exoskeletons, prosthetics and artificial organs -AI, neural networks and fuzzy logic in robotics -multimodality human-machine interaction -wireless sensor networks for robot navigation -multi-sensor data fusion and SLAM