Putting visual object recognition in context.

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Pub Date : 2020-06-01 Epub Date: 2020-08-05 DOI:10.1109/CVPR42600.2020.01300

Mengmi Zhang, Claire Tseng, Gabriel Kreiman

{"title":"Putting visual object recognition in context.","authors":"Mengmi Zhang, Claire Tseng, Gabriel Kreiman","doi":"10.1109/CVPR42600.2020.01300","DOIUrl":null,"url":null,"abstract":"<p><p>Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g. a cow in the ocean). To understand and model the role of contextual information in visual recognition, we systematically and quantitatively investigated ten critical properties of where, when, and how context modulates recognition including amount of context, context and object resolution, geometrical structure of context, context congruence, time required to incorporate contextual information, and temporal dynamics of contextual modulation. The tasks involve recognizing a target object surrounded with context in a natural image. As an essential benchmark, we first describe a series of psychophysics experiments, where we alter one aspect of context at a time, and quantify human recognition accuracy. To computationally assess performance on the same tasks, we propose a biologically inspired context aware object recognition model consisting of a two-stream architecture. The model processes visual information at the fovea and periphery in parallel, dynamically incorporates both object and contextual information, and sequentially reasons about the class label for the target object. Across a wide range of behavioral tasks, the model approximates human level performance without retraining for each task, captures the dependence of context enhancement on image properties, and provides initial steps towards integrating scene and object information for visual recognition.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2020 ","pages":"12982-12991"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CVPR42600.2020.01300","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR42600.2020.01300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/8/5 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 36

Abstract

Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g. a cow in the ocean). To understand and model the role of contextual information in visual recognition, we systematically and quantitatively investigated ten critical properties of where, when, and how context modulates recognition including amount of context, context and object resolution, geometrical structure of context, context congruence, time required to incorporate contextual information, and temporal dynamics of contextual modulation. The tasks involve recognizing a target object surrounded with context in a natural image. As an essential benchmark, we first describe a series of psychophysics experiments, where we alter one aspect of context at a time, and quantify human recognition accuracy. To computationally assess performance on the same tasks, we propose a biologically inspired context aware object recognition model consisting of a two-stream architecture. The model processes visual information at the fovea and periphery in parallel, dynamically incorporates both object and contextual information, and sequentially reasons about the class label for the target object. Across a wide range of behavioral tasks, the model approximates human level performance without retraining for each task, captures the dependence of context enhancement on image properties, and provides initial steps towards integrating scene and object information for visual recognition.

查看原文本刊更多论文

将视觉对象识别置于上下文中。

语境在视觉识别中起着重要的作用。最近的研究表明，视觉识别网络可以通过将物体放置在不一致的环境中(例如，海洋中的奶牛)而被愚弄。为了理解语境信息在视觉识别中的作用并建立模型，我们系统地、定量地研究了语境在何地、何时以及如何调节识别的十个关键属性，包括语境的数量、语境和物体分辨率、语境的几何结构、语境一致性、整合语境信息所需的时间以及语境调节的时间动态。这些任务包括在自然图像中识别被上下文包围的目标物体。作为基本基准，我们首先描述了一系列心理物理学实验，在这些实验中，我们一次改变上下文的一个方面，并量化人类识别的准确性。为了计算评估在相同任务上的性能，我们提出了一个由两流架构组成的生物启发的上下文感知对象识别模型。该模型在中央凹和外围并行处理视觉信息，动态地合并对象和上下文信息，并依次对目标对象的类标签进行推理。在广泛的行为任务中，该模型近似于人类水平的表现，而无需对每个任务进行再训练，捕获上下文增强对图像属性的依赖，并为视觉识别整合场景和对象信息提供了初步步骤。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

CiteScore

43.50

自引率

0.00%

发文量