When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes.

Philipp Bomatter, Mengmi Zhang, Dimitar Karev, Spandan Madan, Claire Tseng, Gabriel Kreiman
{"title":"When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes.","authors":"Philipp Bomatter, Mengmi Zhang, Dimitar Karev, Spandan Madan, Claire Tseng, Gabriel Kreiman","doi":"10.1109/iccv48922.2021.00032","DOIUrl":null,"url":null,"abstract":"<p><p>Context is of fundamental importance to both human and machine vision; e.g., an object in the air is more likely to be an airplane than a pig. The rich notion of context incorporates several aspects including physics rules, statistical co-occurrences, and relative object sizes, among others. While previous work has focused on crowd-sourced out-of-context photographs from the web to study scene context, controlling the nature and extent of contextual violations has been a daunting task. Here we introduce a diverse, synthetic <b>O</b>ut-of-<b>C</b>ontext <b>D</b>ataset (OCD) with fine-grained control over scene context. By leveraging a 3D simulation engine, we systematically control the gravity, object co-occurrences and relative sizes across 36 object categories in a virtual household environment. We conducted a series of experiments to gain insights into the impact of contextual cues on both human and machine vision using OCD. We conducted psychophysics experiments to establish a human benchmark for out-of-context recognition, and then compared it with state-of-the-art computer vision models to quantify the gap between the two. We propose a context-aware recognition transformer model, fusing object and contextual information via multi-head attention. Our model captures useful information for contextual reasoning, enabling human-level performance and better robustness in out-of-context conditions compared to baseline models across OCD and other out-of-context datasets. All source code and data are publicly available at https://github.com/kreimanlab/WhenPigsFlyContext.</p>","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":" ","pages":"255-264"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9432425/pdf/nihms-1831598.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccv48922.2021.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/2/28 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Context is of fundamental importance to both human and machine vision; e.g., an object in the air is more likely to be an airplane than a pig. The rich notion of context incorporates several aspects including physics rules, statistical co-occurrences, and relative object sizes, among others. While previous work has focused on crowd-sourced out-of-context photographs from the web to study scene context, controlling the nature and extent of contextual violations has been a daunting task. Here we introduce a diverse, synthetic Out-of-Context Dataset (OCD) with fine-grained control over scene context. By leveraging a 3D simulation engine, we systematically control the gravity, object co-occurrences and relative sizes across 36 object categories in a virtual household environment. We conducted a series of experiments to gain insights into the impact of contextual cues on both human and machine vision using OCD. We conducted psychophysics experiments to establish a human benchmark for out-of-context recognition, and then compared it with state-of-the-art computer vision models to quantify the gap between the two. We propose a context-aware recognition transformer model, fusing object and contextual information via multi-head attention. Our model captures useful information for contextual reasoning, enabling human-level performance and better robustness in out-of-context conditions compared to baseline models across OCD and other out-of-context datasets. All source code and data are publicly available at https://github.com/kreimanlab/WhenPigsFlyContext.

Abstract Image

当猪飞起来的时候合成场景和自然场景中的情境推理
上下文对于人类和机器视觉都至关重要,例如,空中的物体更有可能是飞机而不是猪。丰富的上下文概念包含多个方面,包括物理规则、统计共现和相对物体大小等。以往的研究工作主要是利用网络上的人群来源断章取义照片来研究场景上下文,但控制违反上下文的性质和程度一直是一项艰巨的任务。在这里,我们引入了一个多样化的合成语境外数据集(OCD),可对场景语境进行精细控制。通过利用三维模拟引擎,我们系统地控制了虚拟家庭环境中 36 个物体类别的重力、物体共现和相对大小。我们利用 OCD 进行了一系列实验,以深入了解上下文线索对人类和机器视觉的影响。我们进行了心理物理学实验,以建立语境外识别的人类基准,然后将其与最先进的计算机视觉模型进行比较,以量化两者之间的差距。我们提出了一种情境感知识别转换器模型,通过多头注意力融合物体和情境信息。我们的模型捕捉到了语境推理的有用信息,在强迫症和其他语境外数据集上,与基线模型相比,我们的模型能够在语境外条件下实现人类水平的性能和更好的鲁棒性。所有源代码和数据均可在 https://github.com/kreimanlab/WhenPigsFlyContext 公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信