Understanding eye movements: psychophysics and a model of primary visual cortex

Q4 Computer Science

Electronic Letters on Computer Vision and Image Analysis Pub Date : 2019-07-26 DOI:10.5565/rev/elcvia.1193

David Berga

{"title":"Understanding eye movements: psychophysics and a model of primary visual cortex","authors":"David Berga","doi":"10.5565/rev/elcvia.1193","DOIUrl":null,"url":null,"abstract":"Humans move their eyes in order to learn visual representations of the world. These eye movements depend on distinct factors, either by the scene that we perceive or by our own decisions. To select what is relevant to attend is part of our survival mechanisms and the way we build reality, as we constantly react both consciously and unconsciously to all the stimuli that is projected into our eyes. In this thesis we try to explain (1) how we move our eyes, (2) how to build machines that understand visual information and deploy eye movements, and (3) how to make these machines understand tasks in order to decide for eye movements. (1) We provided the analysis of eye movement behavior elicited by low-level feature distinctiveness with a dataset of 230 synthetically-generated image patterns. A total of 15 types of stimuli has been generated (e.g. orientation, brightness, color, size, etc.), with 7 feature contrasts for each feature category. Eye-tracking data was collected from 34 participants during the viewing of the dataset, using Free-Viewing and Visual Search task instructions. Results showed that saliency is predominantly and distinctively influenced by: 1. feature type, 2. feature contrast, 3. temporality of fixations, 4. task difficulty and 5. center bias. From such dataset (SID4VAM), we have computed a benchmark of saliency models by testing performance using psychophysical patterns. Model performance has been evaluated considering model inspiration and consistency with human psychophysics. Our study reveals that state-of-the-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. (2) Computations in the primary visual cortex (area V1 or striate cortex) have long been hypothesized to be responsible, among several visual processing mechanisms, of bottom-up visual attention (also named saliency). In order to validate this hypothesis, images from eye tracking datasets have been processed with a biologically plausible model of V1 (named Neurodynamic Saliency Wavelet Model or NSWAM). Following Li's neurodynamic model, we define V1's lateral connections with a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation and scale. Early subcortical processes (i.e. retinal and thalamic) are functionally simulated. The resulting saliency maps are generated from the model output, representing the neuronal activity of V1 projections towards brain areas involved in eye movement control. We want to pinpoint that our unified computational architecture is able to reproduce several visual processes (i.e. brightness, chromatic induction and visual discomfort) without applying any type of training or optimization and keeping the same parametrization. The model has been extended (NSWAM-CM) with an implementation of the cortical magnification function to define the retinotopical projections towards V1, processing neuronal activity for each distinct view during scene observation. Novel computational definitions of top-down inhibition (in terms of inhibition of return and selection mechanisms), are also proposed to predict attention in Free-Viewing and Visual Search conditions. Results show that our model outpeforms other biologically-inpired models of saliency prediction as well as to predict visual saccade sequences, specifically for nature and synthetic images. We also show how temporal and spatial characteristics of inhibition of return can improve prediction of saccades, as well as how distinct search strategies (in terms of feature-selective or category-specific inhibition) predict attention at distinct image contexts. (3) Although previous scanpath models have been able to efficiently predict saccades during Free-Viewing, it is well known that stimulus and task instructions can strongly affect eye movement patterns. In particular, task priming has been shown to be crucial to the deployment of eye movements, involving interactions between brain areas related to goal-directed behavior, working and long-term memory in combination with stimulus-driven eye movement neuronal correlates. In our latest study we proposed an extension of the Selective Tuning Attentive Reference Fixation Controller Model based on task demands (STAR-FCT), describing novel computational definitions of Long-Term Memory, Visual Task Executive and Task Working Memory. With these modules we are able to use textual instructions in order to guide the model to attend to specific categories of objects and/or places in the scene. We have designed our memory model by processing a visual hierarchy of low- and high-level features. The relationship between the executive task instructions and the memory representations has been specified using a tree of semantic similarities between the learned features and the object category labels. Results reveal that by using this model, the resulting object localization maps and predicted saccades have a higher probability to fall inside the salient regions depending on the distinct task instructions compared to saliency.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Letters on Computer Vision and Image Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5565/rev/elcvia.1193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 1

Abstract

Humans move their eyes in order to learn visual representations of the world. These eye movements depend on distinct factors, either by the scene that we perceive or by our own decisions. To select what is relevant to attend is part of our survival mechanisms and the way we build reality, as we constantly react both consciously and unconsciously to all the stimuli that is projected into our eyes. In this thesis we try to explain (1) how we move our eyes, (2) how to build machines that understand visual information and deploy eye movements, and (3) how to make these machines understand tasks in order to decide for eye movements. (1) We provided the analysis of eye movement behavior elicited by low-level feature distinctiveness with a dataset of 230 synthetically-generated image patterns. A total of 15 types of stimuli has been generated (e.g. orientation, brightness, color, size, etc.), with 7 feature contrasts for each feature category. Eye-tracking data was collected from 34 participants during the viewing of the dataset, using Free-Viewing and Visual Search task instructions. Results showed that saliency is predominantly and distinctively influenced by: 1. feature type, 2. feature contrast, 3. temporality of fixations, 4. task difficulty and 5. center bias. From such dataset (SID4VAM), we have computed a benchmark of saliency models by testing performance using psychophysical patterns. Model performance has been evaluated considering model inspiration and consistency with human psychophysics. Our study reveals that state-of-the-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. (2) Computations in the primary visual cortex (area V1 or striate cortex) have long been hypothesized to be responsible, among several visual processing mechanisms, of bottom-up visual attention (also named saliency). In order to validate this hypothesis, images from eye tracking datasets have been processed with a biologically plausible model of V1 (named Neurodynamic Saliency Wavelet Model or NSWAM). Following Li's neurodynamic model, we define V1's lateral connections with a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation and scale. Early subcortical processes (i.e. retinal and thalamic) are functionally simulated. The resulting saliency maps are generated from the model output, representing the neuronal activity of V1 projections towards brain areas involved in eye movement control. We want to pinpoint that our unified computational architecture is able to reproduce several visual processes (i.e. brightness, chromatic induction and visual discomfort) without applying any type of training or optimization and keeping the same parametrization. The model has been extended (NSWAM-CM) with an implementation of the cortical magnification function to define the retinotopical projections towards V1, processing neuronal activity for each distinct view during scene observation. Novel computational definitions of top-down inhibition (in terms of inhibition of return and selection mechanisms), are also proposed to predict attention in Free-Viewing and Visual Search conditions. Results show that our model outpeforms other biologically-inpired models of saliency prediction as well as to predict visual saccade sequences, specifically for nature and synthetic images. We also show how temporal and spatial characteristics of inhibition of return can improve prediction of saccades, as well as how distinct search strategies (in terms of feature-selective or category-specific inhibition) predict attention at distinct image contexts. (3) Although previous scanpath models have been able to efficiently predict saccades during Free-Viewing, it is well known that stimulus and task instructions can strongly affect eye movement patterns. In particular, task priming has been shown to be crucial to the deployment of eye movements, involving interactions between brain areas related to goal-directed behavior, working and long-term memory in combination with stimulus-driven eye movement neuronal correlates. In our latest study we proposed an extension of the Selective Tuning Attentive Reference Fixation Controller Model based on task demands (STAR-FCT), describing novel computational definitions of Long-Term Memory, Visual Task Executive and Task Working Memory. With these modules we are able to use textual instructions in order to guide the model to attend to specific categories of objects and/or places in the scene. We have designed our memory model by processing a visual hierarchy of low- and high-level features. The relationship between the executive task instructions and the memory representations has been specified using a tree of semantic similarities between the learned features and the object category labels. Results reveal that by using this model, the resulting object localization maps and predicted saccades have a higher probability to fall inside the salient regions depending on the distinct task instructions compared to saliency.

查看原文本刊更多论文

理解眼球运动:心理物理学和初级视觉皮层模型

人类移动眼睛是为了学习世界的视觉表现。这些眼球运动取决于不同的因素，无论是我们感知的场景还是我们自己的决定。选择与参与相关的内容是我们生存机制的一部分，也是我们构建现实的方式，因为我们不断有意识和无意识地对投射到眼睛中的所有刺激做出反应。在这篇论文中，我们试图解释（1）我们如何移动眼睛，（2）如何构建理解视觉信息和部署眼球运动的机器，以及（3）如何让这些机器理解任务以决定眼球运动。（1）我们用230个合成生成的图像模式的数据集对低水平特征显著性引发的眼动行为进行了分析。总共生成了15种类型的刺激（例如方向、亮度、颜色、大小等），每个特征类别有7种特征对比度。在查看数据集期间，使用免费查看和视觉搜索任务说明，从34名参与者身上收集了眼动追踪数据。结果表明，显著性主要且显著地受到以下因素的影响：1。特征类型，2。特征对比度，3。注视的时间性，4。任务难度和5。中心偏移。根据这样的数据集（SID4VAM），我们通过使用心理物理模式测试性能来计算显著性模型的基准。模型性能的评估考虑了模型的灵感和与人类心理物理学的一致性。我们的研究表明，最先进的深度学习显著性模型在合成模式图像中表现不佳，相反，具有光谱/傅立叶启发的模型在显著性度量方面优于其他模型，更符合人类心理物理学实验。（2）长期以来，在几种视觉处理机制中，初级视觉皮层（V1区或纹状体皮层）的计算一直被假设是自下而上的视觉注意（也称为显著性）的原因。为了验证这一假设，用生物学上合理的V1模型（称为神经动力学显著性小波模型或NSWAM）处理了眼睛跟踪数据集的图像。根据李的神经动力学模型，我们定义了V1与放电率神经元网络的横向连接，这些神经元对亮度、颜色、方向和尺度等视觉特征敏感。早期皮质下过程（即视网膜和丘脑）在功能上是模拟的。由此产生的显著性图是从模型输出中生成的，表示V1投射到涉及眼球运动控制的大脑区域的神经元活动。我们希望指出，我们的统一计算架构能够在不应用任何类型的训练或优化并保持相同参数化的情况下再现几个视觉过程（即亮度、颜色感应和视觉不适）。该模型已经扩展（NSWAM-CM），实现了皮层放大功能，以定义视网膜局部向V1的投影，在场景观察期间处理每个不同视图的神经元活动。还提出了自上而下抑制的新计算定义（根据返回和选择机制的抑制），以预测自由观看和视觉搜索条件下的注意力。结果表明，我们的模型在显著性预测以及预测视觉扫视序列方面优于其他生物学上有缺陷的模型，特别是对于自然和合成图像。我们还展示了抑制返回的时间和空间特征如何改善扫视的预测，以及不同的搜索策略（在特征选择性或类别特异性抑制方面）如何在不同的图像上下文中预测注意力。（3）尽管以前的扫描路径模型能够有效地预测自由观看期间的扫视，但众所周知，刺激和任务指令会强烈影响眼球运动模式。特别是，任务启动已被证明对眼动的部署至关重要，涉及与目标导向行为、工作和长期记忆相关的大脑区域之间的相互作用，以及刺激驱动的眼动神经元相关性。在我们的最新研究中，我们提出了基于任务需求的选择性调谐注意参考固定控制器模型（STAR-FCT）的扩展，描述了长期记忆、视觉任务执行和任务工作记忆的新计算定义。通过这些模块，我们能够使用文本指令来指导模型关注场景中特定类别的对象和/或位置。我们通过处理低层次和高层次特征的视觉层次来设计我们的内存模型。执行任务指令和记忆表示之间的关系已经使用学习特征和对象类别标签之间的语义相似性树来指定。结果表明，通过使用该模型，与显著性相比，根据不同的任务指令，得到的对象定位图和预测的扫视有更高的概率落入显著区域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊