仿人机器人导航的模仿行为强化学习：同步规划与控制

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots Pub Date : 2024-04-17 DOI:10.1007/s10514-024-10160-w

Xiaoying Wang, Tong Zhang

{"title":"仿人机器人导航的模仿行为强化学习：同步规划与控制","authors":"Xiaoying Wang, Tong Zhang","doi":"10.1007/s10514-024-10160-w","DOIUrl":null,"url":null,"abstract":"<div><p>Humanoid robots have strong adaptability to complex environments and possess human-like flexibility, enabling them to perform precise farming and harvesting tasks in varying depths of terrains. They serve as essential tools for agricultural intelligence. In this article, a novel method was proposed to improve the robustness of autonomous navigation for humanoid robots, which intercommunicates the data fusion of the footprint planning and control levels. In particular, a deep reinforcement learning model - Proximal Policy Optimization (PPO) that has been fine-tuned is introduced into this layer, before which heuristic trajectory was generated based on imitation learning. In the RL period, the KL divergence between the agent’s policy and imitative expert policy as a value penalty is added to the advantage function. As a proof of concept, our navigation policy is trained in a robotic simulator and then successfully applied to the physical robot <i>GTX</i> for indoor multi-mode navigation. The experimental results conclude that incorporating imitation learning imparts anthropomorphic attributes to robots and facilitates the generation of seamless footstep patterns. There is a significant improvement in ZMP trajectory in y-direction from the center by 21.56% is noticed. Additionally, this method improves dynamic locomotion stability, the body attitude angle falling between less than ± 5.5<span>\\(^\\circ \\)</span> compared to ± 48.4<span>\\(^\\circ \\)</span> with traditional algorithm. In general, navigation error is below 5 cm, which we verified in the experiments. It is thought that the outcome of the proposed framework presented in this article can provide a reference for researchers studying autonomous navigation applications of humanoid robots on uneven ground.\n</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 2-3","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning with imitative behaviors for humanoid robots navigation: synchronous planning and control\",\"authors\":\"Xiaoying Wang, Tong Zhang\",\"doi\":\"10.1007/s10514-024-10160-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Humanoid robots have strong adaptability to complex environments and possess human-like flexibility, enabling them to perform precise farming and harvesting tasks in varying depths of terrains. They serve as essential tools for agricultural intelligence. In this article, a novel method was proposed to improve the robustness of autonomous navigation for humanoid robots, which intercommunicates the data fusion of the footprint planning and control levels. In particular, a deep reinforcement learning model - Proximal Policy Optimization (PPO) that has been fine-tuned is introduced into this layer, before which heuristic trajectory was generated based on imitation learning. In the RL period, the KL divergence between the agent’s policy and imitative expert policy as a value penalty is added to the advantage function. As a proof of concept, our navigation policy is trained in a robotic simulator and then successfully applied to the physical robot <i>GTX</i> for indoor multi-mode navigation. The experimental results conclude that incorporating imitation learning imparts anthropomorphic attributes to robots and facilitates the generation of seamless footstep patterns. There is a significant improvement in ZMP trajectory in y-direction from the center by 21.56% is noticed. Additionally, this method improves dynamic locomotion stability, the body attitude angle falling between less than ± 5.5<span>\\\\(^\\\\circ \\\\)</span> compared to ± 48.4<span>\\\\(^\\\\circ \\\\)</span> with traditional algorithm. In general, navigation error is below 5 cm, which we verified in the experiments. It is thought that the outcome of the proposed framework presented in this article can provide a reference for researchers studying autonomous navigation applications of humanoid robots on uneven ground.\\n</p></div>\",\"PeriodicalId\":55409,\"journal\":{\"name\":\"Autonomous Robots\",\"volume\":\"48 2-3\",\"pages\":\"\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Autonomous Robots\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10514-024-10160-w\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Robots","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10514-024-10160-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

仿人机器人对复杂环境有很强的适应能力，具有类似人类的灵活性，能够在不同深度的地形中执行精确的耕作和收割任务。它们是农业智能的重要工具。本文提出了一种提高仿人机器人自主导航鲁棒性的新方法，该方法将足迹规划和控制层面的数据融合起来。特别是，在这一层中引入了经过微调的深度强化学习模型--近端策略优化（PPO），在此之前，基于模仿学习生成启发式轨迹。在 RL 阶段，代理策略与模仿专家策略之间的 KL 发散作为一种价值惩罚被添加到优势函数中。作为概念验证，我们在机器人模拟器中训练了导航策略，并将其成功应用于物理机器人 GTX 的室内多模式导航。实验结果表明，模仿学习赋予了机器人拟人属性，并有助于生成无缝脚步模式。ZMP轨迹在从中心开始的Y方向上有明显改善，改善幅度达21.56%。此外，该方法还提高了动态运动的稳定性，与传统算法的± 48.4（^\circ \）相比，该方法的身体姿态角小于± 5.5（^\circ \）。一般来说，导航误差低于 5 厘米，这一点我们在实验中得到了验证。本文提出的框架成果可以为研究仿人机器人在不平整地面上的自主导航应用提供参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Reinforcement learning with imitative behaviors for humanoid robots navigation: synchronous planning and control

查看原文本刊更多论文

Reinforcement learning with imitative behaviors for humanoid robots navigation: synchronous planning and control

Humanoid robots have strong adaptability to complex environments and possess human-like flexibility, enabling them to perform precise farming and harvesting tasks in varying depths of terrains. They serve as essential tools for agricultural intelligence. In this article, a novel method was proposed to improve the robustness of autonomous navigation for humanoid robots, which intercommunicates the data fusion of the footprint planning and control levels. In particular, a deep reinforcement learning model - Proximal Policy Optimization (PPO) that has been fine-tuned is introduced into this layer, before which heuristic trajectory was generated based on imitation learning. In the RL period, the KL divergence between the agent’s policy and imitative expert policy as a value penalty is added to the advantage function. As a proof of concept, our navigation policy is trained in a robotic simulator and then successfully applied to the physical robot GTX for indoor multi-mode navigation. The experimental results conclude that incorporating imitation learning imparts anthropomorphic attributes to robots and facilitates the generation of seamless footstep patterns. There is a significant improvement in ZMP trajectory in y-direction from the center by 21.56% is noticed. Additionally, this method improves dynamic locomotion stability, the body attitude angle falling between less than ± 5.5\(^\circ \) compared to ± 48.4\(^\circ \) with traditional algorithm. In general, navigation error is below 5 cm, which we verified in the experiments. It is thought that the outcome of the proposed framework presented in this article can provide a reference for researchers studying autonomous navigation applications of humanoid robots on uneven ground.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Autonomous Robots 工程技术-机器人学

CiteScore

7.90

自引率

5.70%

发文量

审稿时长

3 months

期刊介绍： Autonomous Robots reports on the theory and applications of robotic systems capable of some degree of self-sufficiency. It features papers that include performance data on actual robots in the real world. Coverage includes: control of autonomous robots · real-time vision · autonomous wheeled and tracked vehicles · legged vehicles · computational architectures for autonomous systems · distributed architectures for learning, control and adaptation · studies of autonomous robot systems · sensor fusion · theory of autonomous systems · terrain mapping and recognition · self-calibration and self-repair for robots · self-reproducing intelligent structures · genetic algorithms as models for robot development. The focus is on the ability to move and be self-sufficient, not on whether the system is an imitation of biology. Of course, biological models for robotic systems are of major interest to the journal since living systems are prototypes for autonomous behavior.