Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments

IEEE Transactions on Autonomous Mental Development Pub Date : 2012-12-01 DOI:10.1109/TAMD.2012.2205924

F. Stulp, J. Buchli, Alice Ellmer, M. Mistry, Evangelos A. Theodorou, S. Schaal

{"title":"Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments","authors":"F. Stulp, J. Buchli, Alice Ellmer, M. Mistry, Evangelos A. Theodorou, S. Schaal","doi":"10.1109/TAMD.2012.2205924","DOIUrl":null,"url":null,"abstract":"For humans and robots, variable impedance control is an essential component for ensuring robust and safe physical interaction with the environment. Humans learn to adapt their impedance to specific tasks and environments; a capability which we continually develop and improve until we are well into our twenties. In this article, we reproduce functionally interesting aspects of learning impedance control in humans on a simulated robot platform. As demonstrated in numerous force field tasks, humans combine two strategies to adapt their impedance to perturbations, thereby minimizing position error and energy consumption: 1) if perturbations are unpredictable, subjects increase their impedance through cocontraction; and 2) if perturbations are predictable, subjects learn a feed-forward command to offset the perturbation. We show how a 7-DOF simulated robot demonstrates similar behavior with our model-free reinforcement learning algorithm PI2, by applying deterministic and stochastic force fields to the robot's end-effector. We show the qualitative similarity between the robot and human movements. Our results provide a biologically plausible approach to learning appropriate impedances purely from experience, without requiring a model of either body or environment dynamics. Not requiring models also facilitates autonomous development for robots, as prespecified models cannot be provided for each environment a robot might encounter.","PeriodicalId":49193,"journal":{"name":"IEEE Transactions on Autonomous Mental Development","volume":"15 1","pages":"330-341"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TAMD.2012.2205924","citationCount":"54","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Autonomous Mental Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAMD.2012.2205924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 54

Abstract

For humans and robots, variable impedance control is an essential component for ensuring robust and safe physical interaction with the environment. Humans learn to adapt their impedance to specific tasks and environments; a capability which we continually develop and improve until we are well into our twenties. In this article, we reproduce functionally interesting aspects of learning impedance control in humans on a simulated robot platform. As demonstrated in numerous force field tasks, humans combine two strategies to adapt their impedance to perturbations, thereby minimizing position error and energy consumption: 1) if perturbations are unpredictable, subjects increase their impedance through cocontraction; and 2) if perturbations are predictable, subjects learn a feed-forward command to offset the perturbation. We show how a 7-DOF simulated robot demonstrates similar behavior with our model-free reinforcement learning algorithm PI2, by applying deterministic and stochastic force fields to the robot's end-effector. We show the qualitative similarity between the robot and human movements. Our results provide a biologically plausible approach to learning appropriate impedances purely from experience, without requiring a model of either body or environment dynamics. Not requiring models also facilitates autonomous development for robots, as prespecified models cannot be provided for each environment a robot might encounter.

查看原文本刊更多论文

随机环境下阻抗控制的无模型强化学习

对于人类和机器人来说，可变阻抗控制是确保与环境进行稳健和安全的物理交互的重要组成部分。人类学会了使自己的阻抗适应特定的任务和环境;这种能力我们会不断发展和提高，直到我们二十多岁。在本文中，我们在模拟机器人平台上再现了人类学习阻抗控制的功能有趣方面。正如在许多力场任务中所展示的那样，人类结合两种策略来调整他们的阻抗以适应扰动，从而最大限度地减少位置误差和能量消耗:1)如果扰动不可预测，受试者通过收缩来增加阻抗;2)如果扰动是可预测的，受试者学习前馈命令来抵消扰动。通过将确定性和随机力场应用于机器人的末端执行器，我们展示了如何使用无模型强化学习算法PI2模拟7自由度机器人的类似行为。我们展示了机器人和人类运动之间的定性相似性。我们的研究结果提供了一种生物学上合理的方法，可以纯粹从经验中学习适当的阻抗，而不需要身体或环境动力学模型。不需要模型也有助于机器人的自主开发，因为预先指定的模型不能为机器人可能遇到的每个环境提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Autonomous Mental Development COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ROBOTICS

自引率

0.00%

发文量

审稿时长

3 months