LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

Conference on Robot Learning Pub Date : 2022-07-10 DOI:10.48550/arXiv.2207.04429

Dhruv Shah, B. Osinski, Brian Ichter, S. Levine

{"title":"LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action","authors":"Dhruv Shah, B. Osinski, Brian Ichter, S. Levine","doi":"10.48550/arXiv.2207.04429","DOIUrl":null,"url":null,"abstract":"Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data. We instantiate LM-Nav on a real-world mobile robot and demonstrate long-horizon navigation through complex, outdoor environments from natural language instructions. For videos of our experiments, code release, and an interactive Colab notebook that runs in your browser, please check out our project page https://sites.google.com/view/lmnav","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"140","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.04429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 140

Abstract

Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data. We instantiate LM-Nav on a real-world mobile robot and demonstrate long-horizon navigation through complex, outdoor environments from natural language instructions. For videos of our experiments, code release, and an interactive Colab notebook that runs in your browser, please check out our project page https://sites.google.com/view/lmnav

查看原文本刊更多论文

LM-Nav:机器人导航与语言，视觉和动作的大型预训练模型

机器人导航的目标条件策略可以在大型、无注释的数据集上进行训练，为现实世界的设置提供良好的泛化。然而，特别是在基于视觉的设置中，指定目标需要图像，这使得界面不自然。语言为与机器人的交流提供了一种更方便的方式，但当代的方法通常需要昂贵的监督，以语言描述注释的轨迹的形式。我们提出了一个用于机器人导航的系统，LM-Nav，它可以在未注释的大型轨迹数据集上进行训练，同时仍然为用户提供高级界面。我们表明，这样的系统可以完全由预先训练的导航(ViNG)、图像语言关联(CLIP)和语言建模(GPT-3)模型构建，而不需要任何微调或语言注释的机器人数据。我们在现实世界的移动机器人上实例化了LM-Nav，并通过自然语言指令演示了在复杂的室外环境中进行长视距导航。有关我们的实验视频，代码发布，以及在浏览器中运行的交互式Colab笔记本，请查看我们的项目页面https://sites.google.com/view/lmnav

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Conference on Robot Learning

自引率

0.00%

发文量