DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning

ACM Trans. Graph. Pub Date : 2017-07-20 DOI:10.1145/3072959.3073602

X. B. Peng, G. Berseth, KangKang Yin, M. V. D. Panne

{"title":"DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning","authors":"X. B. Peng, G. Berseth, KangKang Yin, M. V. D. Panne","doi":"10.1145/3072959.3073602","DOIUrl":null,"url":null,"abstract":"Learning physics-based locomotion skills is a difficult problem, leading to solutions that typically exploit prior knowledge of various forms. In this paper we aim to learn a variety of environment-aware locomotion skills with a limited amount of prior knowledge. We adopt a two-level hierarchical control framework. First, low-level controllers are learned that operate at a fine timescale and which achieve robust walking gaits that satisfy stepping-target and style objectives. Second, high-level controllers are then learned which plan at the timescale of steps by invoking desired step targets for the low-level controller. The high-level controller makes decisions directly based on high-dimensional inputs, including terrain maps or other suitable representations of the surroundings. Both levels of the control policy are trained using deep reinforcement learning. Results are demonstrated on a simulated 3D biped. Low-level controllers are learned for a variety of motion styles and demonstrate robustness with respect to force-based disturbances, terrain variations, and style interpolation. High-level controllers are demonstrated that are capable of following trails through terrains, dribbling a soccer ball towards a target location, and navigating through static or dynamic obstacles.","PeriodicalId":7121,"journal":{"name":"ACM Trans. Graph.","volume":"8 1","pages":"41:1-41:13"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"513","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Graph.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3072959.3073602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 513

Abstract

Learning physics-based locomotion skills is a difficult problem, leading to solutions that typically exploit prior knowledge of various forms. In this paper we aim to learn a variety of environment-aware locomotion skills with a limited amount of prior knowledge. We adopt a two-level hierarchical control framework. First, low-level controllers are learned that operate at a fine timescale and which achieve robust walking gaits that satisfy stepping-target and style objectives. Second, high-level controllers are then learned which plan at the timescale of steps by invoking desired step targets for the low-level controller. The high-level controller makes decisions directly based on high-dimensional inputs, including terrain maps or other suitable representations of the surroundings. Both levels of the control policy are trained using deep reinforcement learning. Results are demonstrated on a simulated 3D biped. Low-level controllers are learned for a variety of motion styles and demonstrate robustness with respect to force-based disturbances, terrain variations, and style interpolation. High-level controllers are demonstrated that are capable of following trails through terrains, dribbling a soccer ball towards a target location, and navigating through static or dynamic obstacles.

查看原文本刊更多论文

DeepLoco:使用分层深度强化学习的动态运动技能

学习基于物理的运动技能是一个难题，导致解决方案通常利用各种形式的先验知识。在本文中，我们的目标是在有限的先验知识下学习各种环境感知运动技能。我们采用了两级层次控制框架。首先，学习低级控制器，使其在一个精细的时间尺度上运行，并实现满足步进目标和风格目标的鲁棒步行步态。其次，通过调用低级控制器所需的步骤目标，了解高级控制器在步骤时间尺度上的计划。高级控制器直接根据高维输入做出决策，包括地形图或其他合适的环境表示。控制策略的两个层次都使用深度强化学习进行训练。结果在模拟的3D双足动物上进行了演示。低级控制器学习了各种运动风格，并展示了对基于力的干扰，地形变化和风格插值的鲁棒性。演示了高级控制器能够跟随地形轨迹，将足球运向目标位置，并通过静态或动态障碍物导航。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Trans. Graph.

自引率

0.00%

发文量