基于生成对抗网络的第三人称视觉模仿学习

2022 IEEE International Conference on Development and Learning (ICDL) Pub Date : 2022-09-12 DOI:10.1109/ICDL53763.2022.9962214

Luca Garello, F. Rea, Nicoletta Noceti, A. Sciutti

{"title":"基于生成对抗网络的第三人称视觉模仿学习","authors":"Luca Garello, F. Rea, Nicoletta Noceti, A. Sciutti","doi":"10.1109/ICDL53763.2022.9962214","DOIUrl":null,"url":null,"abstract":"Imitation Learning plays a key role during our development since it allows us to learn from more expert agents. This cognitive ability implies the remapping of seen actions in our perspective. However, in the field of robotics the perspective mismatch between demonstrator and imitator is usually neglected under the assumption that the imitator has access to the explicit joints configuration of the demonstrator or that they both share the same perspective of the environment. Focusing on the perspective translation problem, in this paper we propose a generative approach that shifts the perspective of actions from third person to first person by using RGB videos. In addition to the first person view of the action our model generates an embedded representation of it. This numerical description is autonomously learnt following a time-consistent pattern and without the need of human supervision. In the experimental evaluation, we show that it is possible to exploit these two information to infer robot control during the imitation phase. Additionally, after training on synthetic data, we tested our model in a real scenario.","PeriodicalId":274171,"journal":{"name":"2022 IEEE International Conference on Development and Learning (ICDL)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards Third-Person Visual Imitation Learning Using Generative Adversarial Networks\",\"authors\":\"Luca Garello, F. Rea, Nicoletta Noceti, A. Sciutti\",\"doi\":\"10.1109/ICDL53763.2022.9962214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imitation Learning plays a key role during our development since it allows us to learn from more expert agents. This cognitive ability implies the remapping of seen actions in our perspective. However, in the field of robotics the perspective mismatch between demonstrator and imitator is usually neglected under the assumption that the imitator has access to the explicit joints configuration of the demonstrator or that they both share the same perspective of the environment. Focusing on the perspective translation problem, in this paper we propose a generative approach that shifts the perspective of actions from third person to first person by using RGB videos. In addition to the first person view of the action our model generates an embedded representation of it. This numerical description is autonomously learnt following a time-consistent pattern and without the need of human supervision. In the experimental evaluation, we show that it is possible to exploit these two information to infer robot control during the imitation phase. Additionally, after training on synthetic data, we tested our model in a real scenario.\",\"PeriodicalId\":274171,\"journal\":{\"name\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Development and Learning (ICDL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDL53763.2022.9962214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Development and Learning (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDL53763.2022.9962214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

模仿学习在我们的发展过程中起着关键作用，因为它允许我们向更专业的代理学习。这种认知能力意味着从我们的角度重新映射所看到的行为。然而，在机器人领域中，通常假设模仿者可以获得演示者的显式关节结构或两者共享相同的环境视角，而忽略了演示者和模仿者之间的视角不匹配。针对视角转换问题，本文提出了一种利用RGB视频将第三人称视角转换为第一人称视角的生成方法。除了动作的第一人称视图外，我们的模型还生成了该动作的嵌入式表示。这种数值描述是按照时间一致的模式自主学习的，不需要人类的监督。在实验评估中，我们证明了利用这两个信息来推断机器人在模仿阶段的控制是可能的。此外，在对合成数据进行训练后，我们在真实场景中测试了我们的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Third-Person Visual Imitation Learning Using Generative Adversarial Networks

Imitation Learning plays a key role during our development since it allows us to learn from more expert agents. This cognitive ability implies the remapping of seen actions in our perspective. However, in the field of robotics the perspective mismatch between demonstrator and imitator is usually neglected under the assumption that the imitator has access to the explicit joints configuration of the demonstrator or that they both share the same perspective of the environment. Focusing on the perspective translation problem, in this paper we propose a generative approach that shifts the perspective of actions from third person to first person by using RGB videos. In addition to the first person view of the action our model generates an embedded representation of it. This numerical description is autonomously learnt following a time-consistent pattern and without the need of human supervision. In the experimental evaluation, we show that it is possible to exploit these two information to infer robot control during the imitation phase. Additionally, after training on synthetic data, we tested our model in a real scenario.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Development and Learning (ICDL)

自引率

0.00%

发文量