Spatial Relational Attention Using Fully Convolutional Networks for Image Caption Generation

Teng Jiang, Liang Gong, Yupu Yang
{"title":"Spatial Relational Attention Using Fully Convolutional Networks for Image Caption Generation","authors":"Teng Jiang, Liang Gong, Yupu Yang","doi":"10.1142/s146902682050011x","DOIUrl":null,"url":null,"abstract":"Attention-based encoder–decoder framework has greatly improved image caption generation tasks. The attention mechanism plays a transitional role by transforming static image features into sequential captions. To generate reasonable captions, it is of great significance to detect spatial characteristics of images. In this paper, we propose a spatial relational attention approach to consider spatial positions and attributes. Image features are firstly weighted by the attention mechanism. Then they are concatenated with contextual features to form a spatial–visual tensor. The tensor is feature extracted by a fully convolutional network to produce visual concepts for the decoder network. The fully convolutional layers maintain spatial topology of images. Experiments conducted on the three benchmark datasets, namely Flickr8k, Flickr30k and MSCOCO, demonstrate the effectiveness of our proposed approach. Captions generated by the spatial relational attention method precisely capture spatial relations of objects.","PeriodicalId":422521,"journal":{"name":"Int. J. Comput. Intell. Appl.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Intell. Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s146902682050011x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Attention-based encoder–decoder framework has greatly improved image caption generation tasks. The attention mechanism plays a transitional role by transforming static image features into sequential captions. To generate reasonable captions, it is of great significance to detect spatial characteristics of images. In this paper, we propose a spatial relational attention approach to consider spatial positions and attributes. Image features are firstly weighted by the attention mechanism. Then they are concatenated with contextual features to form a spatial–visual tensor. The tensor is feature extracted by a fully convolutional network to produce visual concepts for the decoder network. The fully convolutional layers maintain spatial topology of images. Experiments conducted on the three benchmark datasets, namely Flickr8k, Flickr30k and MSCOCO, demonstrate the effectiveness of our proposed approach. Captions generated by the spatial relational attention method precisely capture spatial relations of objects.
使用全卷积网络生成图像标题的空间关系注意
基于注意力的编码器-解码器框架极大地改善了图像标题生成任务。注意机制通过将静态图像特征转化为顺序字幕,起到过渡作用。为了生成合理的字幕,检测图像的空间特征是非常重要的。本文提出了一种考虑空间位置和属性的空间关系注意方法。首先通过注意机制对图像特征进行加权。然后将它们与上下文特征连接起来,形成一个空间视觉张量。张量由全卷积网络提取特征,为解码器网络生成视觉概念。全卷积层保持图像的空间拓扑结构。在Flickr8k、Flickr30k和MSCOCO三个基准数据集上进行的实验证明了本文方法的有效性。空间关系关注法生成的标题能够准确捕捉物体的空间关系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信