Improvement of attention modules for image captioning using pixel-wise semantic information

Zhihao Chen, Keisuke Doman, Y. Mekada
{"title":"Improvement of attention modules for image captioning using pixel-wise semantic information","authors":"Zhihao Chen, Keisuke Doman, Y. Mekada","doi":"10.1117/12.2644743","DOIUrl":null,"url":null,"abstract":"Although an attention mechanism is reasonable for generating image captions, how to obtain ideal image regions within the mechanism is a problem in practice due to the difficulty of its calculation between image and text data. In order to improve the attention modules for image captioning, we propose an algorithm for handling a pixel-wise semantic information, which is obtained as the outputs of semantic segmentation. The proposed method puts the pixel-wise semantic information into the attention modules for image captioning together with input text data and image features. We conducted evaluation experiments and confirmed that our method could obtain more reasonable weighted image features and better image captions with a BLEU-4 score of 0.306 than its original attention model with a BLEU-4 score of 0.243.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Digital Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2644743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Although an attention mechanism is reasonable for generating image captions, how to obtain ideal image regions within the mechanism is a problem in practice due to the difficulty of its calculation between image and text data. In order to improve the attention modules for image captioning, we propose an algorithm for handling a pixel-wise semantic information, which is obtained as the outputs of semantic segmentation. The proposed method puts the pixel-wise semantic information into the attention modules for image captioning together with input text data and image features. We conducted evaluation experiments and confirmed that our method could obtain more reasonable weighted image features and better image captions with a BLEU-4 score of 0.306 than its original attention model with a BLEU-4 score of 0.243.
基于逐像素语义信息的图像字幕注意模块改进
虽然使用注意机制生成图像标题是合理的,但由于注意机制在图像和文本数据之间的计算困难,如何在该机制内获得理想的图像区域在实践中是一个问题。为了改进图像字幕的注意模块,我们提出了一种处理像素级语义信息的算法,这些信息作为语义分割的输出。该方法将像素语义信息与输入文本数据和图像特征一起放入图像字幕的注意模块中。我们进行了评价实验,证实我们的方法可以获得更合理的加权图像特征和更好的图像标题,BLEU-4得分为0.306,而原始注意力模型的BLEU-4得分为0.243。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信