用于嵌入式视觉导航的对称感知神经结构

IF 9.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision Pub Date : 2023-10-25 DOI:10.1007/s11263-023-01909-4

Shuang Liu, Masanori Suganuma, Takayuki Okatani

{"title":"用于嵌入式视觉导航的对称感知神经结构","authors":"Shuang Liu, Masanori Suganuma, Takayuki Okatani","doi":"10.1007/s11263-023-01909-4","DOIUrl":null,"url":null,"abstract":"The existing methods for addressing visual navigation employ deep reinforcement learning as the standard tool for the task. However, they tend to be vulnerable to statistical shifts between the training and test data, resulting in poor generalization over novel environments that are out-of-distribution from the training data. In this study, we attempt to improve the generalization ability by utilizing the inductive biases available for the task. Employing the active neural SLAM that learns policies with the advantage actor-critic method as the base framework, we first point out that the mappings represented by the actor and the critic should satisfy specific symmetries. We then propose a network design for the actor and the critic to inherently attain these symmetries. Specifically, we use G-convolution instead of the standard convolution and insert the semi-global polar pooling layer, which we newly design in this study, in the last section of the critic network. Our method can be integrated into existing methods that utilize intermediate goals and 2D occupancy maps. Experimental results show that our method improves generalization ability by a good margin over visual exploration and object goal navigation, which are two main embodied visual navigation tasks.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"31 51","pages":""},"PeriodicalIF":9.3000,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Symmetry-aware Neural Architecture for Embodied Visual Navigation\",\"authors\":\"Shuang Liu, Masanori Suganuma, Takayuki Okatani\",\"doi\":\"10.1007/s11263-023-01909-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The existing methods for addressing visual navigation employ deep reinforcement learning as the standard tool for the task. However, they tend to be vulnerable to statistical shifts between the training and test data, resulting in poor generalization over novel environments that are out-of-distribution from the training data. In this study, we attempt to improve the generalization ability by utilizing the inductive biases available for the task. Employing the active neural SLAM that learns policies with the advantage actor-critic method as the base framework, we first point out that the mappings represented by the actor and the critic should satisfy specific symmetries. We then propose a network design for the actor and the critic to inherently attain these symmetries. Specifically, we use G-convolution instead of the standard convolution and insert the semi-global polar pooling layer, which we newly design in this study, in the last section of the critic network. Our method can be integrated into existing methods that utilize intermediate goals and 2D occupancy maps. Experimental results show that our method improves generalization ability by a good margin over visual exploration and object goal navigation, which are two main embodied visual navigation tasks.\",\"PeriodicalId\":13752,\"journal\":{\"name\":\"International Journal of Computer Vision\",\"volume\":\"31 51\",\"pages\":\"\"},\"PeriodicalIF\":9.3000,\"publicationDate\":\"2023-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11263-023-01909-4\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-023-01909-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

现有的视觉导航方法采用深度强化学习作为任务的标准工具。然而，它们往往容易受到训练数据和测试数据之间的统计变化的影响，导致对与训练数据不一致的新环境的泛化能力较差。在这项研究中，我们试图通过利用任务中可用的归纳偏差来提高泛化能力。采用以优势-行动者-批评者方法学习策略的主动神经SLAM作为基本框架，我们首先指出行动者和批评者表示的映射应该满足特定的对称性。然后，我们为演员和评论家提出了一个网络设计，以内在地获得这些对称性。具体来说，我们使用G卷积代替标准卷积，并在评论家网络的最后一节插入我们在本研究中新设计的半全局极池层。我们的方法可以集成到利用中间目标和2D占用图的现有方法中。实验结果表明，与视觉探索和目标导航这两个主要的视觉导航任务相比，我们的方法在很大程度上提高了泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Symmetry-aware Neural Architecture for Embodied Visual Navigation

查看原文本刊更多论文

Symmetry-aware Neural Architecture for Embodied Visual Navigation

The existing methods for addressing visual navigation employ deep reinforcement learning as the standard tool for the task. However, they tend to be vulnerable to statistical shifts between the training and test data, resulting in poor generalization over novel environments that are out-of-distribution from the training data. In this study, we attempt to improve the generalization ability by utilizing the inductive biases available for the task. Employing the active neural SLAM that learns policies with the advantage actor-critic method as the base framework, we first point out that the mappings represented by the actor and the critic should satisfy specific symmetries. We then propose a network design for the actor and the critic to inherently attain these symmetries. Specifically, we use G-convolution instead of the standard convolution and insert the semi-global polar pooling layer, which we newly design in this study, in the last section of the critic network. Our method can be integrated into existing methods that utilize intermediate goals and 2D occupancy maps. Experimental results show that our method improves generalization ability by a good margin over visual exploration and object goal navigation, which are two main embodied visual navigation tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.