Shishir N. Y. Kolathaya, A. Ghosal, B. Amrutur, Ashish Joglekar, Suhan Shetty, Dhaivat Dholakiya, Abhimanyu, Aditya Sagi, Shounak Bhattacharya, Abhik Singla, S. Bhatnagar
{"title":"基于轨迹的四足步行深度策略搜索","authors":"Shishir N. Y. Kolathaya, A. Ghosal, B. Amrutur, Ashish Joglekar, Suhan Shetty, Dhaivat Dholakiya, Abhimanyu, Aditya Sagi, Shounak Bhattacharya, Abhik Singla, S. Bhatnagar","doi":"10.1109/RO-MAN46459.2019.8956369","DOIUrl":null,"url":null,"abstract":"In this paper, we explore a specific form of deep reinforcement learning (D-RL) technique for quadrupedal walking—trajectory based policy search via deep policy networks. Existing approaches determine optimal policies for each time step, whereas we propose to determine an optimal policy for each walking step. We justify our approach based on the fact that animals including humans use “low” dimensional trajectories at the joint level to realize walking. We will construct these trajectories by using Bézier polynomials, with the coefficients being determined by a parameterized policy. In order to maintain smoothness of the trajectories during step transitions, hybrid invariance conditions are also applied. The action is computed at the beginning of every step, and a linear PD control law is applied to track at the individual joints. After each step, reward is computed, which is then used to update the new policy parameters for the next step. After learning an optimal policy, i.e., an optimal walking gait for each step, we then successfully play them in a custom built quadruped robot, Stoch 2, thereby validating our approach.","PeriodicalId":286478,"journal":{"name":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Trajectory based Deep Policy Search for Quadrupedal Walking\",\"authors\":\"Shishir N. Y. Kolathaya, A. Ghosal, B. Amrutur, Ashish Joglekar, Suhan Shetty, Dhaivat Dholakiya, Abhimanyu, Aditya Sagi, Shounak Bhattacharya, Abhik Singla, S. Bhatnagar\",\"doi\":\"10.1109/RO-MAN46459.2019.8956369\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we explore a specific form of deep reinforcement learning (D-RL) technique for quadrupedal walking—trajectory based policy search via deep policy networks. Existing approaches determine optimal policies for each time step, whereas we propose to determine an optimal policy for each walking step. We justify our approach based on the fact that animals including humans use “low” dimensional trajectories at the joint level to realize walking. We will construct these trajectories by using Bézier polynomials, with the coefficients being determined by a parameterized policy. In order to maintain smoothness of the trajectories during step transitions, hybrid invariance conditions are also applied. The action is computed at the beginning of every step, and a linear PD control law is applied to track at the individual joints. After each step, reward is computed, which is then used to update the new policy parameters for the next step. After learning an optimal policy, i.e., an optimal walking gait for each step, we then successfully play them in a custom built quadruped robot, Stoch 2, thereby validating our approach.\",\"PeriodicalId\":286478,\"journal\":{\"name\":\"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)\",\"volume\":\"95 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RO-MAN46459.2019.8956369\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RO-MAN46459.2019.8956369","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Trajectory based Deep Policy Search for Quadrupedal Walking
In this paper, we explore a specific form of deep reinforcement learning (D-RL) technique for quadrupedal walking—trajectory based policy search via deep policy networks. Existing approaches determine optimal policies for each time step, whereas we propose to determine an optimal policy for each walking step. We justify our approach based on the fact that animals including humans use “low” dimensional trajectories at the joint level to realize walking. We will construct these trajectories by using Bézier polynomials, with the coefficients being determined by a parameterized policy. In order to maintain smoothness of the trajectories during step transitions, hybrid invariance conditions are also applied. The action is computed at the beginning of every step, and a linear PD control law is applied to track at the individual joints. After each step, reward is computed, which is then used to update the new policy parameters for the next step. After learning an optimal policy, i.e., an optimal walking gait for each step, we then successfully play them in a custom built quadruped robot, Stoch 2, thereby validating our approach.