Semantic relation graph reasoning network for visual question answering

International Conference on Signal Processing Systems Pub Date : 2021-01-20 DOI:10.1117/12.2588837

Hong Lan, Pufen Zhang

引用次数: 1

Abstract

In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the dynamic interaction between different objects. This task inherently requires reasoning the visual relationships among the objects of image. Meanwhile, the visual reasoning process should be guided by the information of the question. In this paper, we proposed a semantic relation graph reasoning network, the process of semantic relation reasoning is guided by the cross-modal attention mechanism. In addition, a Gated Graph Convolutional Network (GGCN) constructed based on cross-modal attention weights that novelly injects the semantic interaction information between objects into their visual features, and the features with relational awareness are produced. In particular, we trained a semantic relationship detector to extract the semantic relationship between objects for constructing the semantic relation graph. Experiments demonstrate that proposed model outperforms most state-of-the-art methods on the VQA v2.0 benchmark datasets.

查看原文本刊更多论文

面向视觉问答的语义关系图推理网络

为了回答关于图像的语义复杂问题，视觉问答(Visual Question answer, VQA)模型需要充分理解图像中的视觉场景，特别是不同对象之间的动态交互。这项任务本质上需要推理图像对象之间的视觉关系。同时，视觉推理过程应以问题的信息为导向。本文提出了一个语义关系图推理网络，语义关系推理过程由跨模态注意机制引导。此外，基于跨模态注意权构建了门控图卷积网络(GGCN)，将对象之间的语义交互信息新颖地注入到对象的视觉特征中，生成了具有关联感知的特征。特别地，我们训练了一个语义关系检测器来提取对象之间的语义关系，以构建语义关系图。实验表明，该模型在VQA v2.0基准数据集上优于大多数最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Signal Processing Systems

自引率

0.00%

发文量