Jessica A F Thompson, Hannah Sheahan, Tsvetomira Dumbalska, Julian D Sandbrink, Manuela Piazza, Christopher Summerfield
{"title":"Zero-shot counting with a dual-stream neural network model.","authors":"Jessica A F Thompson, Hannah Sheahan, Tsvetomira Dumbalska, Julian D Sandbrink, Manuela Piazza, Christopher Summerfield","doi":"10.1016/j.neuron.2024.10.008","DOIUrl":null,"url":null,"abstract":"<p><p>To understand a visual scene, observers need to both recognize objects and encode relational structure. For example, a scene comprising three apples requires the observer to encode concepts of \"apple\" and \"three.\" In the primate brain, these functions rely on dual (ventral and dorsal) processing streams. Object recognition in primates has been successfully modeled with deep neural networks, but how scene structure (including numerosity) is encoded remains poorly understood. Here, we built a deep learning model, based on the dual-stream architecture of the primate brain, which is able to count items \"zero-shot\"-even if the objects themselves are unfamiliar. Our dual-stream network forms spatial response fields and lognormal number codes that resemble those observed in the macaque posterior parietal cortex. The dual-stream network also makes successful predictions about human counting behavior. Our results provide evidence for an enactive theory of the role of the posterior parietal cortex in visual scene understanding.</p>","PeriodicalId":19313,"journal":{"name":"Neuron","volume":" ","pages":""},"PeriodicalIF":14.7000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuron","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.neuron.2024.10.008","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
To understand a visual scene, observers need to both recognize objects and encode relational structure. For example, a scene comprising three apples requires the observer to encode concepts of "apple" and "three." In the primate brain, these functions rely on dual (ventral and dorsal) processing streams. Object recognition in primates has been successfully modeled with deep neural networks, but how scene structure (including numerosity) is encoded remains poorly understood. Here, we built a deep learning model, based on the dual-stream architecture of the primate brain, which is able to count items "zero-shot"-even if the objects themselves are unfamiliar. Our dual-stream network forms spatial response fields and lognormal number codes that resemble those observed in the macaque posterior parietal cortex. The dual-stream network also makes successful predictions about human counting behavior. Our results provide evidence for an enactive theory of the role of the posterior parietal cortex in visual scene understanding.
期刊介绍:
Established as a highly influential journal in neuroscience, Neuron is widely relied upon in the field. The editors adopt interdisciplinary strategies, integrating biophysical, cellular, developmental, and molecular approaches alongside a systems approach to sensory, motor, and higher-order cognitive functions. Serving as a premier intellectual forum, Neuron holds a prominent position in the entire neuroscience community.