SYNLOCO‐VE: Synthesizing central pattern generator with reinforcement learning and velocity estimator for quadruped locomotion

Optimal Control Applications and Methods Pub Date : 2024-07-10 DOI:10.1002/oca.3181

Xinyu Zhang, Zhiyuan Xiao, Xiang Zhou, Qingrui Zhang

{"title":"SYNLOCO‐VE: Synthesizing central pattern generator with reinforcement learning and velocity estimator for quadruped locomotion","authors":"Xinyu Zhang, Zhiyuan Xiao, Xiang Zhou, Qingrui Zhang","doi":"10.1002/oca.3181","DOIUrl":null,"url":null,"abstract":"It is a challenging task to learn a robust and natural locomotion controller for quadruped robots at different terrains and velocities. In particular, the locomotion learning task will be even more difficult for the case with no exteroceptive sensors. In this article, the learning‐based locomotion control is, therefore, investigated for quadruped robots only using proprioceptive sensors. A new framework called SYNLOCO‐VE is proposed by synthesizing a feedforward gait planner, a trunk velocity estimator, and reinforcement learning (RL). The feedforward gait planner is developed based on the well‐known central pattern generator, but it can change the foot length for improved velocity tracking performance. The trunk velocity estimator is designed based on deep learning, which estimates the trunk velocity using historical data from proprioceptive sensors. The introduction of the trunk velocity estimator can mitigate the influence of the partial observation issue due to the lack of exteroceptive sensors. RL is employed to learn a feedback controller to regulate the robot gaits using feedback from proprioceptive sensors and the trunk velocity estimation. In the proposed framework, the feedforward gait planner can also guide the training process of RL, thus resulting in more stable and faster policy learning. Ablation studies are provided to demonstrate the efficiency of different modules in the proposed design. Extensive experiments are performed using a quadruped robot Go1, which only has proprioceptive sensors. The proposed framework is able to learn robust and stable locomotion at different terrains and tasks. Experimental comparisons are also conducted to illustrate the advantages of the proposed design over the state‐of‐the‐art methods.","PeriodicalId":501055,"journal":{"name":"Optimal Control Applications and Methods","volume":"125 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optimal Control Applications and Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oca.3181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

It is a challenging task to learn a robust and natural locomotion controller for quadruped robots at different terrains and velocities. In particular, the locomotion learning task will be even more difficult for the case with no exteroceptive sensors. In this article, the learning‐based locomotion control is, therefore, investigated for quadruped robots only using proprioceptive sensors. A new framework called SYNLOCO‐VE is proposed by synthesizing a feedforward gait planner, a trunk velocity estimator, and reinforcement learning (RL). The feedforward gait planner is developed based on the well‐known central pattern generator, but it can change the foot length for improved velocity tracking performance. The trunk velocity estimator is designed based on deep learning, which estimates the trunk velocity using historical data from proprioceptive sensors. The introduction of the trunk velocity estimator can mitigate the influence of the partial observation issue due to the lack of exteroceptive sensors. RL is employed to learn a feedback controller to regulate the robot gaits using feedback from proprioceptive sensors and the trunk velocity estimation. In the proposed framework, the feedforward gait planner can also guide the training process of RL, thus resulting in more stable and faster policy learning. Ablation studies are provided to demonstrate the efficiency of different modules in the proposed design. Extensive experiments are performed using a quadruped robot Go1, which only has proprioceptive sensors. The proposed framework is able to learn robust and stable locomotion at different terrains and tasks. Experimental comparisons are also conducted to illustrate the advantages of the proposed design over the state‐of‐the‐art methods.

查看原文本刊更多论文

SYNLOCO-VE：用于四足运动的具有强化学习和速度估计功能的合成中央模式发生器

在不同的地形和速度下，为四足机器人学习稳健自然的运动控制器是一项极具挑战性的任务。特别是在没有外感知传感器的情况下，运动学习任务将更加困难。因此，本文研究了仅使用本体感觉传感器的四足机器人基于学习的运动控制。通过综合前馈步态规划器、躯干速度估计器和强化学习（RL），提出了一个名为 SYNLOCO-VE 的新框架。前馈步态规划器是基于著名的中央模式发生器开发的，但它可以改变脚的长度，以提高速度跟踪性能。躯干速度估算器是基于深度学习设计的，它利用本体感觉传感器的历史数据估算躯干速度。躯干速度估算器的引入可以减轻由于缺乏外感觉传感器而产生的部分观察问题的影响。采用 RL 学习反馈控制器，利用本体感觉传感器的反馈和躯干速度估计来调节机器人的步态。在所提出的框架中，前馈步态规划器还可以指导 RL 的训练过程，从而实现更稳定、更快速的策略学习。为了证明拟议设计中不同模块的效率，我们进行了消融研究。使用四足机器人 Go1 进行了大量实验，该机器人只有本体感觉传感器。所提出的框架能够在不同的地形和任务中学习稳健而稳定的运动。同时还进行了实验比较，以说明与最先进的方法相比，所提出的设计具有哪些优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Optimal Control Applications and Methods

自引率

0.00%

发文量