Luca Garello, F. Rea, Nicoletta Noceti, A. Sciutti
{"title":"基于生成对抗网络的第三人称视觉模仿学习","authors":"Luca Garello, F. Rea, Nicoletta Noceti, A. Sciutti","doi":"10.1109/ICDL53763.2022.9962214","DOIUrl":null,"url":null,"abstract":"Imitation Learning plays a key role during our development since it allows us to learn from more expert agents. This cognitive ability implies the remapping of seen actions in our perspective. However, in the field of robotics the perspective mismatch between demonstrator and imitator is usually neglected under the assumption that the imitator has access to the explicit joints configuration of the demonstrator or that they both share the same perspective of the environment. Focusing on the perspective translation problem, in this paper we propose a generative approach that shifts the perspective of actions from third person to first person by using RGB videos. In addition to the first person view of the action our model generates an embedded representation of it. This numerical description is autonomously learnt following a time-consistent pattern and without the need of human supervision. In the experimental evaluation, we show that it is possible to exploit these two information to infer robot control during the imitation phase. Additionally, after training on synthetic data, we tested our model in a real scenario.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards Third-Person Visual Imitation Learning Using Generative Adversarial Networks\",\"authors\":\"Luca Garello, F. Rea, Nicoletta Noceti, A. Sciutti\",\"doi\":\"10.1109/ICDL53763.2022.9962214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imitation Learning plays a key role during our development since it allows us to learn from more expert agents. This cognitive ability implies the remapping of seen actions in our perspective. However, in the field of robotics the perspective mismatch between demonstrator and imitator is usually neglected under the assumption that the imitator has access to the explicit joints configuration of the demonstrator or that they both share the same perspective of the environment. Focusing on the perspective translation problem, in this paper we propose a generative approach that shifts the perspective of actions from third person to first person by using RGB videos. In addition to the first person view of the action our model generates an embedded representation of it. This numerical description is autonomously learnt following a time-consistent pattern and without the need of human supervision. In the experimental evaluation, we show that it is possible to exploit these two information to infer robot control during the imitation phase. Additionally, after training on synthetic data, we tested our model in a real scenario.\",\"PeriodicalId\":274171,\"journal\":{\"name\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDL53763.2022.9962214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Development and Learning (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDL53763.2022.9962214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Third-Person Visual Imitation Learning Using Generative Adversarial Networks
Imitation Learning plays a key role during our development since it allows us to learn from more expert agents. This cognitive ability implies the remapping of seen actions in our perspective. However, in the field of robotics the perspective mismatch between demonstrator and imitator is usually neglected under the assumption that the imitator has access to the explicit joints configuration of the demonstrator or that they both share the same perspective of the environment. Focusing on the perspective translation problem, in this paper we propose a generative approach that shifts the perspective of actions from third person to first person by using RGB videos. In addition to the first person view of the action our model generates an embedded representation of it. This numerical description is autonomously learnt following a time-consistent pattern and without the need of human supervision. In the experimental evaluation, we show that it is possible to exploit these two information to infer robot control during the imitation phase. Additionally, after training on synthetic data, we tested our model in a real scenario.