客体差异注意:一种简单的视觉问答关系注意

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI:10.1145/3240508.3240513

Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong

{"title":"客体差异注意:一种简单的视觉问答关系注意","authors":"Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong","doi":"10.1145/3240508.3240513","DOIUrl":null,"url":null,"abstract":"Attention mechanism has greatly promoted the development of Visual Question Answering (VQA). Attention distribution, which weights differently on objects (such as image regions or bounding boxes) in an image according to their importance for answering a question, plays a crucial role in attention mechanism. Most of the existing work focuses on fusing image features and text features to calculate the attention distribution without comparisons between different image objects. As a major property of attention, selectivity depends on comparisons between different objects. Comparisons provide more information for assigning attentions better. For achieving this, we propose an object-difference attention (ODA) which calculates the probability of attention by implementing difference operator between different image objects in an image under the guidance of questions in hand. Experimental results on three publicly available datasets show our ODA based VQA model achieves the state-of-the-art results. Furthermore, a general form of relational attention is proposed. Besides ODA, several other relational attentions are given. Experimental results show those relational attentions have strengths on different types of questions.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":"{\"title\":\"Object-Difference Attention: A Simple Relational Attention for Visual Question Answering\",\"authors\":\"Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong\",\"doi\":\"10.1145/3240508.3240513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Attention mechanism has greatly promoted the development of Visual Question Answering (VQA). Attention distribution, which weights differently on objects (such as image regions or bounding boxes) in an image according to their importance for answering a question, plays a crucial role in attention mechanism. Most of the existing work focuses on fusing image features and text features to calculate the attention distribution without comparisons between different image objects. As a major property of attention, selectivity depends on comparisons between different objects. Comparisons provide more information for assigning attentions better. For achieving this, we propose an object-difference attention (ODA) which calculates the probability of attention by implementing difference operator between different image objects in an image under the guidance of questions in hand. Experimental results on three publicly available datasets show our ODA based VQA model achieves the state-of-the-art results. Furthermore, a general form of relational attention is proposed. Besides ODA, several other relational attentions are given. Experimental results show those relational attentions have strengths on different types of questions.\",\"PeriodicalId\":339857,\"journal\":{\"name\":\"Proceedings of the 26th ACM international conference on Multimedia\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"36\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3240508.3240513\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3240508.3240513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 36

摘要

注意机制极大地促进了视觉问答(VQA)的发展。注意分配在注意机制中起着至关重要的作用，它根据图像中物体(如图像区域或边界框)对回答问题的重要性给予不同的权重。现有的工作大多集中在融合图像特征和文本特征来计算注意力分布，而没有进行不同图像对象之间的比较。作为注意力的一个主要特性，选择性取决于不同对象之间的比较。比较为更好地分配注意力提供了更多的信息。为了实现这一目标，我们提出了一种目标差分注意(ODA)方法，该方法在手头问题的指导下，通过对图像中不同图像对象之间的差分算子来计算关注的概率。在三个公开数据集上的实验结果表明，我们的基于ODA的VQA模型达到了最先进的结果。在此基础上，提出了关系注意的一般形式。除了ODA之外，还提出了其他几个相关的注意事项。实验结果表明，这些关联关注在不同类型的问题上都有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Object-Difference Attention: A Simple Relational Attention for Visual Question Answering

Attention mechanism has greatly promoted the development of Visual Question Answering (VQA). Attention distribution, which weights differently on objects (such as image regions or bounding boxes) in an image according to their importance for answering a question, plays a crucial role in attention mechanism. Most of the existing work focuses on fusing image features and text features to calculate the attention distribution without comparisons between different image objects. As a major property of attention, selectivity depends on comparisons between different objects. Comparisons provide more information for assigning attentions better. For achieving this, we propose an object-difference attention (ODA) which calculates the probability of attention by implementing difference operator between different image objects in an image under the guidance of questions in hand. Experimental results on three publicly available datasets show our ODA based VQA model achieves the state-of-the-art results. Furthermore, a general form of relational attention is proposed. Besides ODA, several other relational attentions are given. Experimental results show those relational attentions have strengths on different types of questions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 26th ACM international conference on Multimedia

自引率

0.00%

发文量