Trajectory based Deep Policy Search for Quadrupedal Walking

Shishir N. Y. Kolathaya, A. Ghosal, B. Amrutur, Ashish Joglekar, Suhan Shetty, Dhaivat Dholakiya, Abhimanyu, Aditya Sagi, Shounak Bhattacharya, Abhik Singla, S. Bhatnagar
{"title":"Trajectory based Deep Policy Search for Quadrupedal Walking","authors":"Shishir N. Y. Kolathaya, A. Ghosal, B. Amrutur, Ashish Joglekar, Suhan Shetty, Dhaivat Dholakiya, Abhimanyu, Aditya Sagi, Shounak Bhattacharya, Abhik Singla, S. Bhatnagar","doi":"10.1109/RO-MAN46459.2019.8956369","DOIUrl":null,"url":null,"abstract":"In this paper, we explore a specific form of deep reinforcement learning (D-RL) technique for quadrupedal walking—trajectory based policy search via deep policy networks. Existing approaches determine optimal policies for each time step, whereas we propose to determine an optimal policy for each walking step. We justify our approach based on the fact that animals including humans use “low” dimensional trajectories at the joint level to realize walking. We will construct these trajectories by using Bézier polynomials, with the coefficients being determined by a parameterized policy. In order to maintain smoothness of the trajectories during step transitions, hybrid invariance conditions are also applied. The action is computed at the beginning of every step, and a linear PD control law is applied to track at the individual joints. After each step, reward is computed, which is then used to update the new policy parameters for the next step. After learning an optimal policy, i.e., an optimal walking gait for each step, we then successfully play them in a custom built quadruped robot, Stoch 2, thereby validating our approach.","PeriodicalId":286478,"journal":{"name":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RO-MAN46459.2019.8956369","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this paper, we explore a specific form of deep reinforcement learning (D-RL) technique for quadrupedal walking—trajectory based policy search via deep policy networks. Existing approaches determine optimal policies for each time step, whereas we propose to determine an optimal policy for each walking step. We justify our approach based on the fact that animals including humans use “low” dimensional trajectories at the joint level to realize walking. We will construct these trajectories by using Bézier polynomials, with the coefficients being determined by a parameterized policy. In order to maintain smoothness of the trajectories during step transitions, hybrid invariance conditions are also applied. The action is computed at the beginning of every step, and a linear PD control law is applied to track at the individual joints. After each step, reward is computed, which is then used to update the new policy parameters for the next step. After learning an optimal policy, i.e., an optimal walking gait for each step, we then successfully play them in a custom built quadruped robot, Stoch 2, thereby validating our approach.
基于轨迹的四足步行深度策略搜索
在本文中,我们探索了一种特定形式的深度强化学习(D-RL)技术,用于通过深度策略网络进行基于四足行走轨迹的策略搜索。现有方法确定每个时间步的最优策略,而我们建议为每个步行步确定最优策略。基于包括人类在内的动物在关节水平上使用“低”维轨迹来实现行走的事实,我们证明了我们的方法是正确的。我们将使用bsamzier多项式来构造这些轨迹,其系数由参数化策略确定。为了在阶跃转换过程中保持轨迹的平滑性,还采用了混合不变性条件。在每一步开始时计算动作,并在各个关节处应用线性PD控制律进行跟踪。在每一步之后,计算奖励,然后使用奖励来更新下一步的新策略参数。在学习了最优策略(即每一步的最佳步行步态)之后,我们成功地在定制的四足机器人Stoch 2中进行了测试,从而验证了我们的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信