Spatiotemporal evidence accumulation through saccadic sampling for object recognition.

IF 4 2区 医学 Q1 NEUROSCIENCES
Zhihao Zheng, Jiaqi Hu, Gouki Okazawa
{"title":"Spatiotemporal evidence accumulation through saccadic sampling for object recognition.","authors":"Zhihao Zheng, Jiaqi Hu, Gouki Okazawa","doi":"10.1523/JNEUROSCI.2453-24.2025","DOIUrl":null,"url":null,"abstract":"<p><p>Visual object recognition has been extensively studied under fixation conditions, but our natural viewing involves frequent saccadic eye movements that scan multiple local informative features within an object (e.g., eyes and mouth in a face image). These saccades would contribute to object recognition by subserving the integration of sensory information across local features, but mechanistic models underlying this process have yet to be established due to the presumed complexity of the interactions between the visual and oculomotor systems. Here, we employ a framework of perceptual decision making and show that human object categorization behavior with saccades can be quantitatively explained by a model that simply accumulates the sensory evidence available at each moment. Human participants of both sexes performed face and object categorization while they were allowed to freely make saccades to scan local features. Our model could successfully fit the data even during such a free viewing condition, departing from past studies that required controlled eye movements to test trans-saccadic integration. Moreover, further experimental results confirmed that active saccade commands (efference copy) do not substantially contribute to evidence accumulation. Therefore, we propose that object recognition with saccades can be approximated by a parsimonious decision-making model without assuming complex interactions between the visual and oculomotor systems.<b>Significance statement</b> When we view an object to judge its identity or properties, we move our eyes to inspect multiple local features, gathering dynamic information. How does object recognition unfold during this complex sequence of events? To explain object recognition with saccades, should we model precisely how the visual and oculomotor systems exchange information in the brain? Instead, we demonstrate that human object recognition can be quantitatively explained by a decision-making model that processes each snapshot of an image sequence and simply integrates information over the course of multiple eye movements. This model approximates human behavior without additional mechanisms, even under experimental conditions in which people freely move their eyes to scan local features without constraint during face and object recognition.</p>","PeriodicalId":50114,"journal":{"name":"Journal of Neuroscience","volume":" ","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1523/JNEUROSCI.2453-24.2025","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Visual object recognition has been extensively studied under fixation conditions, but our natural viewing involves frequent saccadic eye movements that scan multiple local informative features within an object (e.g., eyes and mouth in a face image). These saccades would contribute to object recognition by subserving the integration of sensory information across local features, but mechanistic models underlying this process have yet to be established due to the presumed complexity of the interactions between the visual and oculomotor systems. Here, we employ a framework of perceptual decision making and show that human object categorization behavior with saccades can be quantitatively explained by a model that simply accumulates the sensory evidence available at each moment. Human participants of both sexes performed face and object categorization while they were allowed to freely make saccades to scan local features. Our model could successfully fit the data even during such a free viewing condition, departing from past studies that required controlled eye movements to test trans-saccadic integration. Moreover, further experimental results confirmed that active saccade commands (efference copy) do not substantially contribute to evidence accumulation. Therefore, we propose that object recognition with saccades can be approximated by a parsimonious decision-making model without assuming complex interactions between the visual and oculomotor systems.Significance statement When we view an object to judge its identity or properties, we move our eyes to inspect multiple local features, gathering dynamic information. How does object recognition unfold during this complex sequence of events? To explain object recognition with saccades, should we model precisely how the visual and oculomotor systems exchange information in the brain? Instead, we demonstrate that human object recognition can be quantitatively explained by a decision-making model that processes each snapshot of an image sequence and simply integrates information over the course of multiple eye movements. This model approximates human behavior without additional mechanisms, even under experimental conditions in which people freely move their eyes to scan local features without constraint during face and object recognition.

通过跳变采样进行对象识别的时空证据积累。
在注视条件下,视觉对象识别已被广泛研究,但我们的自然观看涉及频繁的扫视眼运动,扫描对象内的多个局部信息特征(例如,面部图像中的眼睛和嘴)。这些扫视将有助于通过局部特征整合感官信息来识别物体,但由于视觉和动眼肌系统之间相互作用的假定复杂性,这一过程的机制模型尚未建立。在这里,我们采用了一个感知决策的框架,并表明人类扫视的对象分类行为可以通过一个简单地积累每一刻可用的感官证据的模型来定量解释。男性和女性参与者都进行了面部和物体分类,同时允许他们自由地进行扫视来扫描局部特征。我们的模型可以成功地拟合数据,即使在这样的自由观看条件下,与过去需要控制眼球运动来测试跨眼动整合的研究不同。此外,进一步的实验结果证实,主动扫视命令(参考复制)对证据积累没有实质性的贡献。因此,我们提出,用扫视来识别物体可以用一个简洁的决策模型来近似,而不需要假设视觉和动眼肌系统之间复杂的相互作用。当我们观察一个物体来判断它的身份或属性时,我们移动眼睛来检查多个局部特征,收集动态信息。在这个复杂的事件序列中,物体识别是如何展开的?为了解释用扫视来识别物体,我们是否应该精确地模拟视觉和眼动系统如何在大脑中交换信息?相反,我们证明了人类物体识别可以通过一个决策模型来定量解释,该模型处理图像序列的每个快照,并简单地整合多个眼球运动过程中的信息。该模型在没有附加机制的情况下近似人类行为,即使在实验条件下,人们在面部和物体识别过程中自由移动眼睛来扫描局部特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Neuroscience
Journal of Neuroscience 医学-神经科学
CiteScore
9.30
自引率
3.80%
发文量
1164
审稿时长
12 months
期刊介绍: JNeurosci (ISSN 0270-6474) is an official journal of the Society for Neuroscience. It is published weekly by the Society, fifty weeks a year, one volume a year. JNeurosci publishes papers on a broad range of topics of general interest to those working on the nervous system. Authors now have an Open Choice option for their published articles
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信