Double awareness mechanism based deep learning framework for image captioning

{"title":"Double awareness mechanism based deep learning framework for image captioning","authors":"","doi":"10.47974/jdmsc-1728","DOIUrl":null,"url":null,"abstract":"Understanding the qualities of an image and converting them into a phrase or sentence that makes sense is the process of image captioning. Neuroscience research has only recently made clear the connection between human vision and language formation. Although there have been many methods for captioning images, including content retrieval and template filling, the current trend is toward deep learning-based methods. Using an image encoder, feature vectors are created from an image through the deep learning process, and a language decoder converts these feature vectors into a string of words. Using encoder-decoder approach or simple attention based has not provided so much efficient results. In the proposed model, double awareness-based mechanism has been used. The primary goal of this study is to extract visual properties from the region of interest (RoI) of an image as well as text features using the glove embedding technique. Inception ResNet version of Convolutional neural network (CNN) is used as an encoder. As a decoder, a gated recurrent unit is employed. The proposed model is tested on Flickr8k dataset and it can be seen that the results achieved through double awareness mechanism are highly effective.","PeriodicalId":193977,"journal":{"name":"Journal of Discrete Mathematical Sciences and Cryptography","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Discrete Mathematical Sciences and Cryptography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47974/jdmsc-1728","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Understanding the qualities of an image and converting them into a phrase or sentence that makes sense is the process of image captioning. Neuroscience research has only recently made clear the connection between human vision and language formation. Although there have been many methods for captioning images, including content retrieval and template filling, the current trend is toward deep learning-based methods. Using an image encoder, feature vectors are created from an image through the deep learning process, and a language decoder converts these feature vectors into a string of words. Using encoder-decoder approach or simple attention based has not provided so much efficient results. In the proposed model, double awareness-based mechanism has been used. The primary goal of this study is to extract visual properties from the region of interest (RoI) of an image as well as text features using the glove embedding technique. Inception ResNet version of Convolutional neural network (CNN) is used as an encoder. As a decoder, a gated recurrent unit is employed. The proposed model is tested on Flickr8k dataset and it can be seen that the results achieved through double awareness mechanism are highly effective.
基于双感知机制的图像字幕深度学习框架
理解图像的性质并将其转换为有意义的短语或句子是图像字幕的过程。神经科学研究直到最近才弄清楚人类视觉和语言形成之间的联系。虽然已经有很多方法来为图像添加字幕,包括内容检索和模板填充,但目前的趋势是基于深度学习的方法。使用图像编码器,通过深度学习过程从图像创建特征向量,语言解码器将这些特征向量转换为一串单词。使用编码器-解码器方法或简单的基于注意力的方法并没有提供如此有效的结果。在该模型中,采用了基于双重意识的机制。本研究的主要目标是利用手套嵌入技术从图像的感兴趣区域(RoI)和文本特征中提取视觉属性。采用Inception ResNet版本的卷积神经网络(CNN)作为编码器。作为解码器,采用门控循环单元。在Flickr8k数据集上对该模型进行了测试,可以看出通过双感知机制获得的结果是非常有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信