{"title":"Spatiotemporal evidence accumulation through saccadic sampling for object recognition.","authors":"Zhihao Zheng, Jiaqi Hu, Gouki Okazawa","doi":"10.1523/JNEUROSCI.2453-24.2025","DOIUrl":null,"url":null,"abstract":"<p><p>Visual object recognition has been extensively studied under fixation conditions, but our natural viewing involves frequent saccadic eye movements that scan multiple local informative features within an object (e.g., eyes and mouth in a face image). These saccades would contribute to object recognition by subserving the integration of sensory information across local features, but mechanistic models underlying this process have yet to be established due to the presumed complexity of the interactions between the visual and oculomotor systems. Here, we employ a framework of perceptual decision making and show that human object categorization behavior with saccades can be quantitatively explained by a model that simply accumulates the sensory evidence available at each moment. Human participants of both sexes performed face and object categorization while they were allowed to freely make saccades to scan local features. Our model could successfully fit the data even during such a free viewing condition, departing from past studies that required controlled eye movements to test trans-saccadic integration. Moreover, further experimental results confirmed that active saccade commands (efference copy) do not substantially contribute to evidence accumulation. Therefore, we propose that object recognition with saccades can be approximated by a parsimonious decision-making model without assuming complex interactions between the visual and oculomotor systems.<b>Significance statement</b> When we view an object to judge its identity or properties, we move our eyes to inspect multiple local features, gathering dynamic information. How does object recognition unfold during this complex sequence of events? To explain object recognition with saccades, should we model precisely how the visual and oculomotor systems exchange information in the brain? Instead, we demonstrate that human object recognition can be quantitatively explained by a decision-making model that processes each snapshot of an image sequence and simply integrates information over the course of multiple eye movements. This model approximates human behavior without additional mechanisms, even under experimental conditions in which people freely move their eyes to scan local features without constraint during face and object recognition.</p>","PeriodicalId":50114,"journal":{"name":"Journal of Neuroscience","volume":" ","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1523/JNEUROSCI.2453-24.2025","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Visual object recognition has been extensively studied under fixation conditions, but our natural viewing involves frequent saccadic eye movements that scan multiple local informative features within an object (e.g., eyes and mouth in a face image). These saccades would contribute to object recognition by subserving the integration of sensory information across local features, but mechanistic models underlying this process have yet to be established due to the presumed complexity of the interactions between the visual and oculomotor systems. Here, we employ a framework of perceptual decision making and show that human object categorization behavior with saccades can be quantitatively explained by a model that simply accumulates the sensory evidence available at each moment. Human participants of both sexes performed face and object categorization while they were allowed to freely make saccades to scan local features. Our model could successfully fit the data even during such a free viewing condition, departing from past studies that required controlled eye movements to test trans-saccadic integration. Moreover, further experimental results confirmed that active saccade commands (efference copy) do not substantially contribute to evidence accumulation. Therefore, we propose that object recognition with saccades can be approximated by a parsimonious decision-making model without assuming complex interactions between the visual and oculomotor systems.Significance statement When we view an object to judge its identity or properties, we move our eyes to inspect multiple local features, gathering dynamic information. How does object recognition unfold during this complex sequence of events? To explain object recognition with saccades, should we model precisely how the visual and oculomotor systems exchange information in the brain? Instead, we demonstrate that human object recognition can be quantitatively explained by a decision-making model that processes each snapshot of an image sequence and simply integrates information over the course of multiple eye movements. This model approximates human behavior without additional mechanisms, even under experimental conditions in which people freely move their eyes to scan local features without constraint during face and object recognition.
期刊介绍:
JNeurosci (ISSN 0270-6474) is an official journal of the Society for Neuroscience. It is published weekly by the Society, fifty weeks a year, one volume a year. JNeurosci publishes papers on a broad range of topics of general interest to those working on the nervous system. Authors now have an Open Choice option for their published articles