Visually Grounded Language Learning for Robot Navigation

1st International Workshop on Multimodal Understanding and Learning for Embodied Applications Pub Date : 2019-10-15 DOI:10.1145/3347450.3357655

E. Ünal, Ozan Arkan Can, Y. Yemez

引用次数: 3

Abstract

We present an end-to-end deep learning model for robot navigation from raw visual pixel input and natural text instructions. The proposed model is an LSTM-based sequence-to-sequence neural network architecture with attention, which is trained on instruction-perception data samples collected in a synthetic environment. We conduct experiments on the SAIL dataset which we reconstruct in 3D so as to generate the 2D images associated with the data. Our experiments show that the performance of our model is on a par with state-of-the-art, despite the fact that it learns navigational language with end-to-end training from raw visual data.

查看原文本刊更多论文

基于视觉的机器人导航语言学习

我们提出了一个基于原始视觉像素输入和自然文本指令的机器人导航端到端深度学习模型。该模型是一种基于lstm的具有注意力的序列到序列神经网络架构，该模型是在合成环境中收集的指令感知数据样本上进行训练的。我们对SAIL数据集进行三维重建实验，生成与数据相关的二维图像。我们的实验表明，我们的模型的性能与最先进的水平相当，尽管它是通过原始视觉数据的端到端训练来学习导航语言的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

1st International Workshop on Multimodal Understanding and Learning for Embodied Applications

自引率

0.00%

发文量