Developmentally Synthesizing Earthworm-Like Locomotion Gaits with Bayesian-Augmented Deep Deterministic Policy Gradients (DDPG)

Sayyed Jaffar Ali Raza, Apan Dastider, Mingjie Lin
{"title":"Developmentally Synthesizing Earthworm-Like Locomotion Gaits with Bayesian-Augmented Deep Deterministic Policy Gradients (DDPG)","authors":"Sayyed Jaffar Ali Raza, Apan Dastider, Mingjie Lin","doi":"10.1109/CASE48305.2020.9216782","DOIUrl":null,"url":null,"abstract":"In this paper, a reinforcement learning method is presented to generate earthworm-like gaits for a hyperredundant earthworm-like manipulator robot. Partially inspired by human brain’s learning mechanism, the proposed learning framework builds its preliminary belief by first starting with adapting rudimentary gaits governed by a generic kinematic knowledge of undulatory, sidewinding and circular patterns. The preliminary belief is then represented as a prior ensemble to learn new gaits by leveraging apriori knowledge and learning a policy by inferring posterior over prior distribution. While the fundamental idea of incorporating Bayesian learning with reinforcement learning is not new, this paper extends Bayesian actor-critic approach by introducing augmented prior-based directed bias in policy search, aiding in faster parameter learning and reduced sampling requirements. We show results on an in-house built 10-DoF earthworm-like robot that exhibits adaptive development, qualitatively learning different locomotion modes, while given with only rudimentary generic gait behaviors. The results are compared against deterministic policy gradient method (DDPG) for continuous control as the baseline. We show that our proposed method can characterize effective performance over DDPG, and it also achieves faster kinematic indexes in various gaits.","PeriodicalId":212181,"journal":{"name":"2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 16th International Conference on Automation Science and Engineering (CASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CASE48305.2020.9216782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper, a reinforcement learning method is presented to generate earthworm-like gaits for a hyperredundant earthworm-like manipulator robot. Partially inspired by human brain’s learning mechanism, the proposed learning framework builds its preliminary belief by first starting with adapting rudimentary gaits governed by a generic kinematic knowledge of undulatory, sidewinding and circular patterns. The preliminary belief is then represented as a prior ensemble to learn new gaits by leveraging apriori knowledge and learning a policy by inferring posterior over prior distribution. While the fundamental idea of incorporating Bayesian learning with reinforcement learning is not new, this paper extends Bayesian actor-critic approach by introducing augmented prior-based directed bias in policy search, aiding in faster parameter learning and reduced sampling requirements. We show results on an in-house built 10-DoF earthworm-like robot that exhibits adaptive development, qualitatively learning different locomotion modes, while given with only rudimentary generic gait behaviors. The results are compared against deterministic policy gradient method (DDPG) for continuous control as the baseline. We show that our proposed method can characterize effective performance over DDPG, and it also achieves faster kinematic indexes in various gaits.
基于贝叶斯增强深度确定性策略梯度(DDPG)的类蚯蚓运动步态发育合成
针对超冗余度类蚯蚓机械臂机器人,提出了一种生成类蚯蚓步态的强化学习方法。部分受到人类大脑学习机制的启发,提出的学习框架首先从适应由波动、侧绕和圆形模式的一般运动学知识控制的基本步态开始,建立其初步信念。然后将初步信念表示为先验集合,通过利用先验知识学习新步态,并通过推断后验先验分布来学习策略。虽然将贝叶斯学习与强化学习相结合的基本思想并不新鲜,但本文通过在策略搜索中引入增强的基于先验的定向偏差来扩展贝叶斯行为者批评方法,有助于更快的参数学习和减少采样要求。我们展示了一个内部建造的10自由度类蚯蚓机器人的结果,该机器人表现出自适应发展,定性地学习不同的运动模式,而只给出基本的通用步态行为。结果与连续控制的确定性策略梯度法(DDPG)作为基线进行了比较。实验结果表明,该方法可以有效地表征DDPG的性能,并在各种步态下实现更快的运动学指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信