Automated construction of cognitive maps with visual predictive coding

IF 18.8 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
James Gornet, Matt Thomson
{"title":"Automated construction of cognitive maps with visual predictive coding","authors":"James Gornet, Matt Thomson","doi":"10.1038/s42256-024-00863-1","DOIUrl":null,"url":null,"abstract":"Humans construct internal cognitive maps of their environment directly from sensory inputs without access to a system of explicit coordinates or distance measurements. Although machine learning algorithms like simultaneous localization and mapping utilize specialized inference procedures to identify visual features and construct spatial maps from visual and odometry data, the general nature of cognitive maps in the brain suggests a unified mapping algorithmic strategy that can generalize to auditory, tactile and linguistic inputs. Here we demonstrate that predictive coding provides a natural and versatile neural network algorithm for constructing spatial maps using sensory data. We introduce a framework in which an agent navigates a virtual environment while engaging in visual predictive coding using a self-attention-equipped convolutional neural network. While learning a next-image prediction task, the agent automatically constructs an internal representation of the environment that quantitatively reflects spatial distances. The internal map enables the agent to pinpoint its location relative to landmarks using only visual information.The predictive coding network generates a vectorized encoding of the environment that supports vector navigation, where individual latent space units delineate localized, overlapping neighbourhoods in the environment. Broadly, our work introduces predictive coding as a unified algorithmic framework for constructing cognitive maps that can naturally extend to the mapping of auditory, sensorimotor and linguistic inputs. Constructing spatial maps from sensory inputs is challenging in both neuroscience and artificial intelligence. Gornet and Thomson show that as an agent navigates an environment, a self-attention neural network using predictive coding can recover the environment’s map in its latent space.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 7","pages":"820-833"},"PeriodicalIF":18.8000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00863-1.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.nature.com/articles/s42256-024-00863-1","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Humans construct internal cognitive maps of their environment directly from sensory inputs without access to a system of explicit coordinates or distance measurements. Although machine learning algorithms like simultaneous localization and mapping utilize specialized inference procedures to identify visual features and construct spatial maps from visual and odometry data, the general nature of cognitive maps in the brain suggests a unified mapping algorithmic strategy that can generalize to auditory, tactile and linguistic inputs. Here we demonstrate that predictive coding provides a natural and versatile neural network algorithm for constructing spatial maps using sensory data. We introduce a framework in which an agent navigates a virtual environment while engaging in visual predictive coding using a self-attention-equipped convolutional neural network. While learning a next-image prediction task, the agent automatically constructs an internal representation of the environment that quantitatively reflects spatial distances. The internal map enables the agent to pinpoint its location relative to landmarks using only visual information.The predictive coding network generates a vectorized encoding of the environment that supports vector navigation, where individual latent space units delineate localized, overlapping neighbourhoods in the environment. Broadly, our work introduces predictive coding as a unified algorithmic framework for constructing cognitive maps that can naturally extend to the mapping of auditory, sensorimotor and linguistic inputs. Constructing spatial maps from sensory inputs is challenging in both neuroscience and artificial intelligence. Gornet and Thomson show that as an agent navigates an environment, a self-attention neural network using predictive coding can recover the environment’s map in its latent space.

Abstract Image

Abstract Image

利用视觉预测编码自动构建认知地图
人类直接从感官输入构建内部环境认知地图,而无需使用明确的坐标或距离测量系统。虽然机器学习算法(如同步定位和映射)利用专门的推理程序来识别视觉特征,并从视觉和里程数据中构建空间地图,但大脑中认知地图的普遍性表明,一种统一的映射算法策略可以推广到听觉、触觉和语言输入。在这里,我们证明了预测编码为利用感官数据构建空间地图提供了一种自然、通用的神经网络算法。我们介绍了一个框架,在这个框架中,一个代理在虚拟环境中进行导航,同时利用一个配备自我注意力的卷积神经网络进行视觉预测编码。在学习下一个图像预测任务时,代理会自动构建一个定量反映空间距离的内部环境表征。预测编码网络可生成支持矢量导航的环境矢量化编码,其中单个潜在空间单元可划定环境中局部重叠的邻域。从广义上讲,我们的工作将预测编码引入了构建认知地图的统一算法框架,该框架可自然扩展到听觉、感觉运动和语言输入的映射。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
36.90
自引率
2.10%
发文量
127
期刊介绍: Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信