Multiarea processing in body patches of the primate inferotemporal cortex implements inverse graphics.

IF 9.1 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Proceedings of the National Academy of Sciences of the United States of America Pub Date : 2025-07-08 DOI:10.1073/pnas.2420287122

Hakan Yilmaz,Aalap D Shah,Ariadne Letrou,Satwant Kumar,Rufin Vogels,Ilker Yildirim

{"title":"Multiarea processing in body patches of the primate inferotemporal cortex implements inverse graphics.","authors":"Hakan Yilmaz,Aalap D Shah,Ariadne Letrou,Satwant Kumar,Rufin Vogels,Ilker Yildirim","doi":"10.1073/pnas.2420287122","DOIUrl":null,"url":null,"abstract":"Stimulus-driven, multiarea processing in the inferotemporal (IT) cortex is thought to be critical for transforming sensory inputs into useful representations of the world. What are the formats of these neural representations and how are they computed across the nodes of the IT networks? A growing literature in computational neuroscience focuses on the computational-level objective of acquiring high-level image statistics that supports useful distinctions, including between object identities or categories. Here, inspired by classic theories of vision, we suggest an alternative possibility. We show that inferring 3D objects may be a distinct computational-level objective of IT, implemented via an algorithm analogous to graphics-based generative models of how 3D scenes form and project to images, but in the reverse order. Using perception of bodies as a case study, we show that inverse graphics spontaneously emerges in inference networks trained to map images to 3D objects. Remarkably, this correspondence to the reverse of a graphics-based generative model also holds across the body processing network of the macaque IT cortex. Finally, inference networks recapitulate the feedforward progression across the stages of this IT network and do so better than the currently dominant vision models, including both supervised and unsupervised variants, none of which aligns with the reverse of graphics. This work suggests inverse graphics as a multiarea neural algorithm implemented within IT, and points to ways for replicating primate vision capabilities in machines.","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"21 1","pages":"e2420287122"},"PeriodicalIF":9.1000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2420287122","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Stimulus-driven, multiarea processing in the inferotemporal (IT) cortex is thought to be critical for transforming sensory inputs into useful representations of the world. What are the formats of these neural representations and how are they computed across the nodes of the IT networks? A growing literature in computational neuroscience focuses on the computational-level objective of acquiring high-level image statistics that supports useful distinctions, including between object identities or categories. Here, inspired by classic theories of vision, we suggest an alternative possibility. We show that inferring 3D objects may be a distinct computational-level objective of IT, implemented via an algorithm analogous to graphics-based generative models of how 3D scenes form and project to images, but in the reverse order. Using perception of bodies as a case study, we show that inverse graphics spontaneously emerges in inference networks trained to map images to 3D objects. Remarkably, this correspondence to the reverse of a graphics-based generative model also holds across the body processing network of the macaque IT cortex. Finally, inference networks recapitulate the feedforward progression across the stages of this IT network and do so better than the currently dominant vision models, including both supervised and unsupervised variants, none of which aligns with the reverse of graphics. This work suggests inverse graphics as a multiarea neural algorithm implemented within IT, and points to ways for replicating primate vision capabilities in machines.

查看原文本刊更多论文

灵长类动物颞下皮层身体斑块的多区域处理实现了反向图形。

在刺激驱动下，颞下皮层（IT）的多区域处理被认为是将感官输入转化为有用的世界表征的关键。这些神经表征的格式是什么？它们是如何在IT网络的节点上计算的？计算神经科学中越来越多的文献关注于获取高水平图像统计的计算级目标，该目标支持有用的区分，包括对象身份或类别之间的区分。在此，受经典视觉理论的启发，我们提出了另一种可能性。我们表明，推断3D对象可能是IT的一个独特的计算级目标，通过类似于3D场景如何形成和投影到图像的基于图形的生成模型的算法实现，但顺序相反。以身体感知为例，我们展示了反向图形在训练后将图像映射到3D物体的推理网络中自发出现。值得注意的是，这种与基于图形的生成模型相反的对应关系也适用于猕猴IT皮层的整个身体处理网络。最后，推理网络概括了IT网络各阶段的前馈进展，并且比目前占主导地位的视觉模型做得更好，包括有监督和无监督的变体，其中没有一个与图形的反向一致。这项工作表明逆图形是在IT中实现的一种多区域神经算法，并指出了在机器中复制灵长类动物视觉能力的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the National Academy of Sciences of the United States of America 综合性期刊-综合性期刊

CiteScore

19.00

自引率

0.90%

发文量

3575

审稿时长

2.5 months

期刊介绍： The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.