学习导航

1st International Workshop on Multimodal Understanding and Learning for Embodied Applications Pub Date : 2019-10-15 DOI:10.1145/3347450.3357659

Piotr Wojciech Mirowski

{"title":"学习导航","authors":"Piotr Wojciech Mirowski","doi":"10.1145/3347450.3357659","DOIUrl":null,"url":null,"abstract":"Navigation is an important cognitive task that enables humans and animals to traverse, with or without maps, over long distances in the complex world. Such long-range navigation can simultaneously support self-localisation (\"I am here\") and a representation of the goal (\"I am going there\"). For this reason, studying navigation is fundamental to the study and development of artificial intelligence, and trying to replicate navigation in artificial agents can also help neuroscientists understand its biological underpinnings. This talk will cover our own journey to understand navigation by building deep reinforcement learning agents, starting from learning to control a simple agent that can explore and memorise large 3D mazes to designing agents with a read-write memory that can generalise to unseen mazes from one traversal. I will show how these artificial agents relate to navigation in the real world, both through the study of the emergence of grid cell representations in neural networks and by demonstrating that these agents can navigate in Street View-based real world photographic environments. I will finally present two approaches in our ongoing work on leveraging multimodal information for generalising navigation policies to unseen environments in Street View, one consisting in following language instructions and the second one in transferring navigation policies by training on aerial views.","PeriodicalId":329495,"journal":{"name":"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Learning to Navigate\",\"authors\":\"Piotr Wojciech Mirowski\",\"doi\":\"10.1145/3347450.3357659\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Navigation is an important cognitive task that enables humans and animals to traverse, with or without maps, over long distances in the complex world. Such long-range navigation can simultaneously support self-localisation (\\\"I am here\\\") and a representation of the goal (\\\"I am going there\\\"). For this reason, studying navigation is fundamental to the study and development of artificial intelligence, and trying to replicate navigation in artificial agents can also help neuroscientists understand its biological underpinnings. This talk will cover our own journey to understand navigation by building deep reinforcement learning agents, starting from learning to control a simple agent that can explore and memorise large 3D mazes to designing agents with a read-write memory that can generalise to unseen mazes from one traversal. I will show how these artificial agents relate to navigation in the real world, both through the study of the emergence of grid cell representations in neural networks and by demonstrating that these agents can navigate in Street View-based real world photographic environments. I will finally present two approaches in our ongoing work on leveraging multimodal information for generalising navigation policies to unseen environments in Street View, one consisting in following language instructions and the second one in transferring navigation policies by training on aerial views.\",\"PeriodicalId\":329495,\"journal\":{\"name\":\"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications\",\"volume\":\"102 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3347450.3357659\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3347450.3357659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

导航是一项重要的认知任务，它使人类和动物能够在有或没有地图的情况下，在复杂的世界中穿越很远的距离。这种远程导航可以同时支持自我定位(“我在这里”)和目标表示(“我要去那里”)。因此，研究导航是人工智能研究和发展的基础，试图在人工智能体中复制导航也可以帮助神经科学家了解其生物学基础。本次演讲将涵盖我们自己的旅程，通过构建深度强化学习代理来理解导航，从学习控制一个可以探索和记忆大型3D迷宫的简单代理开始，到设计具有读写记忆的代理，可以从一次遍历中归纳到看不见的迷宫。我将通过研究神经网络中网格细胞表示的出现，以及展示这些代理可以在基于街景的真实世界摄影环境中导航，来展示这些人工代理如何与现实世界中的导航联系起来。最后，我将在我们正在进行的利用多模式信息将导航策略推广到街景中看不见的环境的工作中提出两种方法，一种是遵循语言说明，另一种是通过训练鸟瞰图来转移导航策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning to Navigate

Navigation is an important cognitive task that enables humans and animals to traverse, with or without maps, over long distances in the complex world. Such long-range navigation can simultaneously support self-localisation ("I am here") and a representation of the goal ("I am going there"). For this reason, studying navigation is fundamental to the study and development of artificial intelligence, and trying to replicate navigation in artificial agents can also help neuroscientists understand its biological underpinnings. This talk will cover our own journey to understand navigation by building deep reinforcement learning agents, starting from learning to control a simple agent that can explore and memorise large 3D mazes to designing agents with a read-write memory that can generalise to unseen mazes from one traversal. I will show how these artificial agents relate to navigation in the real world, both through the study of the emergence of grid cell representations in neural networks and by demonstrating that these agents can navigate in Street View-based real world photographic environments. I will finally present two approaches in our ongoing work on leveraging multimodal information for generalising navigation policies to unseen environments in Street View, one consisting in following language instructions and the second one in transferring navigation policies by training on aerial views.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

1st International Workshop on Multimodal Understanding and Learning for Embodied Applications

自引率

0.00%

发文量