基于轨迹的四足步行深度策略搜索

2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) Pub Date : 2019-10-01 DOI:10.1109/RO-MAN46459.2019.8956369

Shishir N. Y. Kolathaya, A. Ghosal, B. Amrutur, Ashish Joglekar, Suhan Shetty, Dhaivat Dholakiya, Abhimanyu, Aditya Sagi, Shounak Bhattacharya, Abhik Singla, S. Bhatnagar

{"title":"基于轨迹的四足步行深度策略搜索","authors":"Shishir N. Y. Kolathaya, A. Ghosal, B. Amrutur, Ashish Joglekar, Suhan Shetty, Dhaivat Dholakiya, Abhimanyu, Aditya Sagi, Shounak Bhattacharya, Abhik Singla, S. Bhatnagar","doi":"10.1109/RO-MAN46459.2019.8956369","DOIUrl":null,"url":null,"abstract":"In this paper, we explore a specific form of deep reinforcement learning (D-RL) technique for quadrupedal walking—trajectory based policy search via deep policy networks. Existing approaches determine optimal policies for each time step, whereas we propose to determine an optimal policy for each walking step. We justify our approach based on the fact that animals including humans use “low” dimensional trajectories at the joint level to realize walking. We will construct these trajectories by using Bézier polynomials, with the coefficients being determined by a parameterized policy. In order to maintain smoothness of the trajectories during step transitions, hybrid invariance conditions are also applied. The action is computed at the beginning of every step, and a linear PD control law is applied to track at the individual joints. After each step, reward is computed, which is then used to update the new policy parameters for the next step. After learning an optimal policy, i.e., an optimal walking gait for each step, we then successfully play them in a custom built quadruped robot, Stoch 2, thereby validating our approach.","PeriodicalId":286478,"journal":{"name":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Trajectory based Deep Policy Search for Quadrupedal Walking\",\"authors\":\"Shishir N. Y. Kolathaya, A. Ghosal, B. Amrutur, Ashish Joglekar, Suhan Shetty, Dhaivat Dholakiya, Abhimanyu, Aditya Sagi, Shounak Bhattacharya, Abhik Singla, S. Bhatnagar\",\"doi\":\"10.1109/RO-MAN46459.2019.8956369\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we explore a specific form of deep reinforcement learning (D-RL) technique for quadrupedal walking—trajectory based policy search via deep policy networks. Existing approaches determine optimal policies for each time step, whereas we propose to determine an optimal policy for each walking step. We justify our approach based on the fact that animals including humans use “low” dimensional trajectories at the joint level to realize walking. We will construct these trajectories by using Bézier polynomials, with the coefficients being determined by a parameterized policy. In order to maintain smoothness of the trajectories during step transitions, hybrid invariance conditions are also applied. The action is computed at the beginning of every step, and a linear PD control law is applied to track at the individual joints. After each step, reward is computed, which is then used to update the new policy parameters for the next step. After learning an optimal policy, i.e., an optimal walking gait for each step, we then successfully play them in a custom built quadruped robot, Stoch 2, thereby validating our approach.\",\"PeriodicalId\":286478,\"journal\":{\"name\":\"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)\",\"volume\":\"95 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RO-MAN46459.2019.8956369\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RO-MAN46459.2019.8956369","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在本文中，我们探索了一种特定形式的深度强化学习(D-RL)技术，用于通过深度策略网络进行基于四足行走轨迹的策略搜索。现有方法确定每个时间步的最优策略，而我们建议为每个步行步确定最优策略。基于包括人类在内的动物在关节水平上使用“低”维轨迹来实现行走的事实，我们证明了我们的方法是正确的。我们将使用bsamzier多项式来构造这些轨迹，其系数由参数化策略确定。为了在阶跃转换过程中保持轨迹的平滑性，还采用了混合不变性条件。在每一步开始时计算动作，并在各个关节处应用线性PD控制律进行跟踪。在每一步之后，计算奖励，然后使用奖励来更新下一步的新策略参数。在学习了最优策略(即每一步的最佳步行步态)之后，我们成功地在定制的四足机器人Stoch 2中进行了测试，从而验证了我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Trajectory based Deep Policy Search for Quadrupedal Walking

In this paper, we explore a specific form of deep reinforcement learning (D-RL) technique for quadrupedal walking—trajectory based policy search via deep policy networks. Existing approaches determine optimal policies for each time step, whereas we propose to determine an optimal policy for each walking step. We justify our approach based on the fact that animals including humans use “low” dimensional trajectories at the joint level to realize walking. We will construct these trajectories by using Bézier polynomials, with the coefficients being determined by a parameterized policy. In order to maintain smoothness of the trajectories during step transitions, hybrid invariance conditions are also applied. The action is computed at the beginning of every step, and a linear PD control law is applied to track at the individual joints. After each step, reward is computed, which is then used to update the new policy parameters for the next step. After learning an optimal policy, i.e., an optimal walking gait for each step, we then successfully play them in a custom built quadruped robot, Stoch 2, thereby validating our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

自引率

0.00%

发文量