{"title":"学习导航","authors":"Piotr Wojciech Mirowski","doi":"10.1145/3347450.3357659","DOIUrl":null,"url":null,"abstract":"Navigation is an important cognitive task that enables humans and animals to traverse, with or without maps, over long distances in the complex world. Such long-range navigation can simultaneously support self-localisation (\"I am here\") and a representation of the goal (\"I am going there\"). For this reason, studying navigation is fundamental to the study and development of artificial intelligence, and trying to replicate navigation in artificial agents can also help neuroscientists understand its biological underpinnings. This talk will cover our own journey to understand navigation by building deep reinforcement learning agents, starting from learning to control a simple agent that can explore and memorise large 3D mazes to designing agents with a read-write memory that can generalise to unseen mazes from one traversal. I will show how these artificial agents relate to navigation in the real world, both through the study of the emergence of grid cell representations in neural networks and by demonstrating that these agents can navigate in Street View-based real world photographic environments. I will finally present two approaches in our ongoing work on leveraging multimodal information for generalising navigation policies to unseen environments in Street View, one consisting in following language instructions and the second one in transferring navigation policies by training on aerial views.","PeriodicalId":329495,"journal":{"name":"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Learning to Navigate\",\"authors\":\"Piotr Wojciech Mirowski\",\"doi\":\"10.1145/3347450.3357659\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Navigation is an important cognitive task that enables humans and animals to traverse, with or without maps, over long distances in the complex world. Such long-range navigation can simultaneously support self-localisation (\\\"I am here\\\") and a representation of the goal (\\\"I am going there\\\"). For this reason, studying navigation is fundamental to the study and development of artificial intelligence, and trying to replicate navigation in artificial agents can also help neuroscientists understand its biological underpinnings. This talk will cover our own journey to understand navigation by building deep reinforcement learning agents, starting from learning to control a simple agent that can explore and memorise large 3D mazes to designing agents with a read-write memory that can generalise to unseen mazes from one traversal. I will show how these artificial agents relate to navigation in the real world, both through the study of the emergence of grid cell representations in neural networks and by demonstrating that these agents can navigate in Street View-based real world photographic environments. I will finally present two approaches in our ongoing work on leveraging multimodal information for generalising navigation policies to unseen environments in Street View, one consisting in following language instructions and the second one in transferring navigation policies by training on aerial views.\",\"PeriodicalId\":329495,\"journal\":{\"name\":\"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications\",\"volume\":\"102 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3347450.3357659\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3347450.3357659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Navigation is an important cognitive task that enables humans and animals to traverse, with or without maps, over long distances in the complex world. Such long-range navigation can simultaneously support self-localisation ("I am here") and a representation of the goal ("I am going there"). For this reason, studying navigation is fundamental to the study and development of artificial intelligence, and trying to replicate navigation in artificial agents can also help neuroscientists understand its biological underpinnings. This talk will cover our own journey to understand navigation by building deep reinforcement learning agents, starting from learning to control a simple agent that can explore and memorise large 3D mazes to designing agents with a read-write memory that can generalise to unseen mazes from one traversal. I will show how these artificial agents relate to navigation in the real world, both through the study of the emergence of grid cell representations in neural networks and by demonstrating that these agents can navigate in Street View-based real world photographic environments. I will finally present two approaches in our ongoing work on leveraging multimodal information for generalising navigation policies to unseen environments in Street View, one consisting in following language instructions and the second one in transferring navigation policies by training on aerial views.