态势识别的混合核图注意网络

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI:10.1109/ICCV.2019.01046

M. Suhail, L. Sigal

{"title":"态势识别的混合核图注意网络","authors":"M. Suhail, L. Sigal","doi":"10.1109/ICCV.2019.01046","DOIUrl":null,"url":null,"abstract":"Understanding images beyond salient actions involves reasoning about scene context, objects, and the roles they play in the captured event. Situation recognition has recently been introduced as the task of jointly reasoning about the verbs (actions) and a set of semantic-role and entity (noun) pairs in the form of action frames. Labeling an image with an action frame requires an assignment of values (nouns) to the roles based on the observed image content. Among the inherent challenges are the rich conditional structured dependencies between the output role assignments and the overall semantic sparsity. In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges. Our GNN enables dynamic graph structure during training and inference, through the use of a graph attention mechanism, and context-aware interactions between role pairs. We illustrate the efficacy of our model and design choices by conducting experiments on imSitu benchmark dataset, with accuracy improvements of up to 10% over the state-of-the-art.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"19 1","pages":"10362-10371"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Mixture-Kernel Graph Attention Network for Situation Recognition\",\"authors\":\"M. Suhail, L. Sigal\",\"doi\":\"10.1109/ICCV.2019.01046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding images beyond salient actions involves reasoning about scene context, objects, and the roles they play in the captured event. Situation recognition has recently been introduced as the task of jointly reasoning about the verbs (actions) and a set of semantic-role and entity (noun) pairs in the form of action frames. Labeling an image with an action frame requires an assignment of values (nouns) to the roles based on the observed image content. Among the inherent challenges are the rich conditional structured dependencies between the output role assignments and the overall semantic sparsity. In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges. Our GNN enables dynamic graph structure during training and inference, through the use of a graph attention mechanism, and context-aware interactions between role pairs. We illustrate the efficacy of our model and design choices by conducting experiments on imSitu benchmark dataset, with accuracy improvements of up to 10% over the state-of-the-art.\",\"PeriodicalId\":6728,\"journal\":{\"name\":\"2019 IEEE/CVF International Conference on Computer Vision (ICCV)\",\"volume\":\"19 1\",\"pages\":\"10362-10371\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/CVF International Conference on Computer Vision (ICCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2019.01046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2019.01046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

理解突出动作之外的图像需要对场景背景、对象以及它们在捕获事件中扮演的角色进行推理。情境识别是一种以动作框架的形式对动词(动作)和一组语义-角色和实体(名词)对进行联合推理的任务。用动作框架标记图像需要根据观察到的图像内容为角色分配值(名词)。其中固有的挑战是输出角色分配之间的丰富条件结构化依赖关系和整体语义稀疏性。在本文中，我们提出了一种新的混合核注意图神经网络(GNN)架构来解决这些挑战。我们的GNN通过使用图注意机制和角色对之间的上下文感知交互，在训练和推理期间实现动态图结构。我们通过在imSitu基准数据集上进行实验来说明我们的模型和设计选择的有效性，其精度比最先进的精度提高了10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mixture-Kernel Graph Attention Network for Situation Recognition

Understanding images beyond salient actions involves reasoning about scene context, objects, and the roles they play in the captured event. Situation recognition has recently been introduced as the task of jointly reasoning about the verbs (actions) and a set of semantic-role and entity (noun) pairs in the form of action frames. Labeling an image with an action frame requires an assignment of values (nouns) to the roles based on the observed image content. Among the inherent challenges are the rich conditional structured dependencies between the output role assignments and the overall semantic sparsity. In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges. Our GNN enables dynamic graph structure during training and inference, through the use of a graph attention mechanism, and context-aware interactions between role pairs. We illustrate the efficacy of our model and design choices by conducting experiments on imSitu benchmark dataset, with accuracy improvements of up to 10% over the state-of-the-art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量