LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

Dhruv Shah, B. Osinski, Brian Ichter, S. Levine
{"title":"LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action","authors":"Dhruv Shah, B. Osinski, Brian Ichter, S. Levine","doi":"10.48550/arXiv.2207.04429","DOIUrl":null,"url":null,"abstract":"Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data. We instantiate LM-Nav on a real-world mobile robot and demonstrate long-horizon navigation through complex, outdoor environments from natural language instructions. For videos of our experiments, code release, and an interactive Colab notebook that runs in your browser, please check out our project page https://sites.google.com/view/lmnav","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"140","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.04429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 140

Abstract

Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data. We instantiate LM-Nav on a real-world mobile robot and demonstrate long-horizon navigation through complex, outdoor environments from natural language instructions. For videos of our experiments, code release, and an interactive Colab notebook that runs in your browser, please check out our project page https://sites.google.com/view/lmnav
LM-Nav:机器人导航与语言,视觉和动作的大型预训练模型
机器人导航的目标条件策略可以在大型、无注释的数据集上进行训练,为现实世界的设置提供良好的泛化。然而,特别是在基于视觉的设置中,指定目标需要图像,这使得界面不自然。语言为与机器人的交流提供了一种更方便的方式,但当代的方法通常需要昂贵的监督,以语言描述注释的轨迹的形式。我们提出了一个用于机器人导航的系统,LM-Nav,它可以在未注释的大型轨迹数据集上进行训练,同时仍然为用户提供高级界面。我们表明,这样的系统可以完全由预先训练的导航(ViNG)、图像语言关联(CLIP)和语言建模(GPT-3)模型构建,而不需要任何微调或语言注释的机器人数据。我们在现实世界的移动机器人上实例化了LM-Nav,并通过自然语言指令演示了在复杂的室外环境中进行长视距导航。有关我们的实验视频,代码发布,以及在浏览器中运行的交互式Colab笔记本,请查看我们的项目页面https://sites.google.com/view/lmnav
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信