机器人视觉语义导航的清晰评价

Carlos Gutiérrez-Álvarez, Sergio Hernández-García, Nadia Nasri, Alfredo Cuesta-Infante, R. López-Sastre
{"title":"机器人视觉语义导航的清晰评价","authors":"Carlos Gutiérrez-Álvarez, Sergio Hernández-García, Nadia Nasri, Alfredo Cuesta-Infante, R. López-Sastre","doi":"10.1109/ICARA56516.2023.10125866","DOIUrl":null,"url":null,"abstract":"In this paper we address the problem of visual semantic navigation (VSN), in which a robot needs to navigate through an environment to reach an object having only access to egocentric RGB perception sensors. This is a recently explored problem, where most of the approaches leverage last advances in deep learning models for visual perception, combined with reinforcement learning (RL) strategies. Nonetheless, after a review of the literature, it is complicated to perform direct comparisons between the different solutions. The main difficulties lie in the fact that the navigation environments in which the experimental metrics are reported are not accessible, and each approach uses different RL libraries. In this paper, we release a publicly available experimental setup for the VSN problem, with the aim of providing a clear benchmark. It has been constructed using pyRIL, an open source python library for RL, and two navigation environments: Miniwolrd-Maze from gym-miniworld, and one 3D scene from HM3D dataset using AI Habitat simulator. We finally propose a state-of-the-art VSN model, consisting in a Contrastive Language Image Pretraining (CLIP) visual encoder plus a set of two recurrent neural networks for producing the discrete navigation actions. This model is evaluated in the proposed experimental setup, with a careful analysis of the main VSN challenges, namely: the sparse rewards problem; and the exploitation-exploration trade-off. Code is available at: https://github.com/gramuah/vsn.","PeriodicalId":443572,"journal":{"name":"2023 9th International Conference on Automation, Robotics and Applications (ICARA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Clear Evaluation of Robotic Visual Semantic Navigation\",\"authors\":\"Carlos Gutiérrez-Álvarez, Sergio Hernández-García, Nadia Nasri, Alfredo Cuesta-Infante, R. López-Sastre\",\"doi\":\"10.1109/ICARA56516.2023.10125866\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we address the problem of visual semantic navigation (VSN), in which a robot needs to navigate through an environment to reach an object having only access to egocentric RGB perception sensors. This is a recently explored problem, where most of the approaches leverage last advances in deep learning models for visual perception, combined with reinforcement learning (RL) strategies. Nonetheless, after a review of the literature, it is complicated to perform direct comparisons between the different solutions. The main difficulties lie in the fact that the navigation environments in which the experimental metrics are reported are not accessible, and each approach uses different RL libraries. In this paper, we release a publicly available experimental setup for the VSN problem, with the aim of providing a clear benchmark. It has been constructed using pyRIL, an open source python library for RL, and two navigation environments: Miniwolrd-Maze from gym-miniworld, and one 3D scene from HM3D dataset using AI Habitat simulator. We finally propose a state-of-the-art VSN model, consisting in a Contrastive Language Image Pretraining (CLIP) visual encoder plus a set of two recurrent neural networks for producing the discrete navigation actions. This model is evaluated in the proposed experimental setup, with a careful analysis of the main VSN challenges, namely: the sparse rewards problem; and the exploitation-exploration trade-off. Code is available at: https://github.com/gramuah/vsn.\",\"PeriodicalId\":443572,\"journal\":{\"name\":\"2023 9th International Conference on Automation, Robotics and Applications (ICARA)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 9th International Conference on Automation, Robotics and Applications (ICARA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICARA56516.2023.10125866\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 9th International Conference on Automation, Robotics and Applications (ICARA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARA56516.2023.10125866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们解决了视觉语义导航(VSN)的问题,其中机器人需要在环境中导航以到达只能访问自我中心RGB感知传感器的对象。这是一个最近被探索的问题,其中大多数方法利用了视觉感知深度学习模型的最新进展,并结合了强化学习(RL)策略。尽管如此,在回顾文献之后,在不同的解决方案之间进行直接比较是复杂的。主要的困难在于报告实验指标的导航环境是不可访问的,并且每种方法使用不同的RL库。在本文中,我们为VSN问题发布了一个公开可用的实验设置,目的是提供一个明确的基准。它是使用pyRIL(一个面向RL的开源python库)和两个导航环境构建的:来自gym-miniworld的miniworld - maze和一个来自HM3D数据集的3D场景,使用AI Habitat模拟器。我们最后提出了一个最先进的VSN模型,由一个对比语言图像预训练(CLIP)视觉编码器和一组用于产生离散导航动作的两个循环神经网络组成。该模型在提出的实验设置中进行了评估,并仔细分析了VSN的主要挑战,即:稀疏奖励问题;以及开发和探索之间的权衡。代码可从https://github.com/gramuah/vsn获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards Clear Evaluation of Robotic Visual Semantic Navigation
In this paper we address the problem of visual semantic navigation (VSN), in which a robot needs to navigate through an environment to reach an object having only access to egocentric RGB perception sensors. This is a recently explored problem, where most of the approaches leverage last advances in deep learning models for visual perception, combined with reinforcement learning (RL) strategies. Nonetheless, after a review of the literature, it is complicated to perform direct comparisons between the different solutions. The main difficulties lie in the fact that the navigation environments in which the experimental metrics are reported are not accessible, and each approach uses different RL libraries. In this paper, we release a publicly available experimental setup for the VSN problem, with the aim of providing a clear benchmark. It has been constructed using pyRIL, an open source python library for RL, and two navigation environments: Miniwolrd-Maze from gym-miniworld, and one 3D scene from HM3D dataset using AI Habitat simulator. We finally propose a state-of-the-art VSN model, consisting in a Contrastive Language Image Pretraining (CLIP) visual encoder plus a set of two recurrent neural networks for producing the discrete navigation actions. This model is evaluated in the proposed experimental setup, with a careful analysis of the main VSN challenges, namely: the sparse rewards problem; and the exploitation-exploration trade-off. Code is available at: https://github.com/gramuah/vsn.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信