Carlos Gutiérrez-Álvarez, Sergio Hernández-García, Nadia Nasri, Alfredo Cuesta-Infante, R. López-Sastre
{"title":"机器人视觉语义导航的清晰评价","authors":"Carlos Gutiérrez-Álvarez, Sergio Hernández-García, Nadia Nasri, Alfredo Cuesta-Infante, R. López-Sastre","doi":"10.1109/ICARA56516.2023.10125866","DOIUrl":null,"url":null,"abstract":"In this paper we address the problem of visual semantic navigation (VSN), in which a robot needs to navigate through an environment to reach an object having only access to egocentric RGB perception sensors. This is a recently explored problem, where most of the approaches leverage last advances in deep learning models for visual perception, combined with reinforcement learning (RL) strategies. Nonetheless, after a review of the literature, it is complicated to perform direct comparisons between the different solutions. The main difficulties lie in the fact that the navigation environments in which the experimental metrics are reported are not accessible, and each approach uses different RL libraries. In this paper, we release a publicly available experimental setup for the VSN problem, with the aim of providing a clear benchmark. It has been constructed using pyRIL, an open source python library for RL, and two navigation environments: Miniwolrd-Maze from gym-miniworld, and one 3D scene from HM3D dataset using AI Habitat simulator. We finally propose a state-of-the-art VSN model, consisting in a Contrastive Language Image Pretraining (CLIP) visual encoder plus a set of two recurrent neural networks for producing the discrete navigation actions. This model is evaluated in the proposed experimental setup, with a careful analysis of the main VSN challenges, namely: the sparse rewards problem; and the exploitation-exploration trade-off. Code is available at: https://github.com/gramuah/vsn.","PeriodicalId":443572,"journal":{"name":"2023 9th International Conference on Automation, Robotics and Applications (ICARA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Clear Evaluation of Robotic Visual Semantic Navigation\",\"authors\":\"Carlos Gutiérrez-Álvarez, Sergio Hernández-García, Nadia Nasri, Alfredo Cuesta-Infante, R. López-Sastre\",\"doi\":\"10.1109/ICARA56516.2023.10125866\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we address the problem of visual semantic navigation (VSN), in which a robot needs to navigate through an environment to reach an object having only access to egocentric RGB perception sensors. This is a recently explored problem, where most of the approaches leverage last advances in deep learning models for visual perception, combined with reinforcement learning (RL) strategies. Nonetheless, after a review of the literature, it is complicated to perform direct comparisons between the different solutions. The main difficulties lie in the fact that the navigation environments in which the experimental metrics are reported are not accessible, and each approach uses different RL libraries. In this paper, we release a publicly available experimental setup for the VSN problem, with the aim of providing a clear benchmark. It has been constructed using pyRIL, an open source python library for RL, and two navigation environments: Miniwolrd-Maze from gym-miniworld, and one 3D scene from HM3D dataset using AI Habitat simulator. We finally propose a state-of-the-art VSN model, consisting in a Contrastive Language Image Pretraining (CLIP) visual encoder plus a set of two recurrent neural networks for producing the discrete navigation actions. This model is evaluated in the proposed experimental setup, with a careful analysis of the main VSN challenges, namely: the sparse rewards problem; and the exploitation-exploration trade-off. Code is available at: https://github.com/gramuah/vsn.\",\"PeriodicalId\":443572,\"journal\":{\"name\":\"2023 9th International Conference on Automation, Robotics and Applications (ICARA)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 9th International Conference on Automation, Robotics and Applications (ICARA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICARA56516.2023.10125866\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 9th International Conference on Automation, Robotics and Applications (ICARA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARA56516.2023.10125866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Clear Evaluation of Robotic Visual Semantic Navigation
In this paper we address the problem of visual semantic navigation (VSN), in which a robot needs to navigate through an environment to reach an object having only access to egocentric RGB perception sensors. This is a recently explored problem, where most of the approaches leverage last advances in deep learning models for visual perception, combined with reinforcement learning (RL) strategies. Nonetheless, after a review of the literature, it is complicated to perform direct comparisons between the different solutions. The main difficulties lie in the fact that the navigation environments in which the experimental metrics are reported are not accessible, and each approach uses different RL libraries. In this paper, we release a publicly available experimental setup for the VSN problem, with the aim of providing a clear benchmark. It has been constructed using pyRIL, an open source python library for RL, and two navigation environments: Miniwolrd-Maze from gym-miniworld, and one 3D scene from HM3D dataset using AI Habitat simulator. We finally propose a state-of-the-art VSN model, consisting in a Contrastive Language Image Pretraining (CLIP) visual encoder plus a set of two recurrent neural networks for producing the discrete navigation actions. This model is evaluated in the proposed experimental setup, with a careful analysis of the main VSN challenges, namely: the sparse rewards problem; and the exploitation-exploration trade-off. Code is available at: https://github.com/gramuah/vsn.