Double awareness mechanism based deep learning framework for image captioning

Journal of Discrete Mathematical Sciences and Cryptography Pub Date : 2023-01-01 DOI:10.47974/jdmsc-1728

{"title":"Double awareness mechanism based deep learning framework for image captioning","authors":"","doi":"10.47974/jdmsc-1728","DOIUrl":null,"url":null,"abstract":"Understanding the qualities of an image and converting them into a phrase or sentence that makes sense is the process of image captioning. Neuroscience research has only recently made clear the connection between human vision and language formation. Although there have been many methods for captioning images, including content retrieval and template filling, the current trend is toward deep learning-based methods. Using an image encoder, feature vectors are created from an image through the deep learning process, and a language decoder converts these feature vectors into a string of words. Using encoder-decoder approach or simple attention based has not provided so much efficient results. In the proposed model, double awareness-based mechanism has been used. The primary goal of this study is to extract visual properties from the region of interest (RoI) of an image as well as text features using the glove embedding technique. Inception ResNet version of Convolutional neural network (CNN) is used as an encoder. As a decoder, a gated recurrent unit is employed. The proposed model is tested on Flickr8k dataset and it can be seen that the results achieved through double awareness mechanism are highly effective.","PeriodicalId":193977,"journal":{"name":"Journal of Discrete Mathematical Sciences and Cryptography","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Discrete Mathematical Sciences and Cryptography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47974/jdmsc-1728","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Understanding the qualities of an image and converting them into a phrase or sentence that makes sense is the process of image captioning. Neuroscience research has only recently made clear the connection between human vision and language formation. Although there have been many methods for captioning images, including content retrieval and template filling, the current trend is toward deep learning-based methods. Using an image encoder, feature vectors are created from an image through the deep learning process, and a language decoder converts these feature vectors into a string of words. Using encoder-decoder approach or simple attention based has not provided so much efficient results. In the proposed model, double awareness-based mechanism has been used. The primary goal of this study is to extract visual properties from the region of interest (RoI) of an image as well as text features using the glove embedding technique. Inception ResNet version of Convolutional neural network (CNN) is used as an encoder. As a decoder, a gated recurrent unit is employed. The proposed model is tested on Flickr8k dataset and it can be seen that the results achieved through double awareness mechanism are highly effective.

查看原文本刊更多论文

基于双感知机制的图像字幕深度学习框架

理解图像的性质并将其转换为有意义的短语或句子是图像字幕的过程。神经科学研究直到最近才弄清楚人类视觉和语言形成之间的联系。虽然已经有很多方法来为图像添加字幕，包括内容检索和模板填充，但目前的趋势是基于深度学习的方法。使用图像编码器，通过深度学习过程从图像创建特征向量，语言解码器将这些特征向量转换为一串单词。使用编码器-解码器方法或简单的基于注意力的方法并没有提供如此有效的结果。在该模型中，采用了基于双重意识的机制。本研究的主要目标是利用手套嵌入技术从图像的感兴趣区域(RoI)和文本特征中提取视觉属性。采用Inception ResNet版本的卷积神经网络(CNN)作为编码器。作为解码器，采用门控循环单元。在Flickr8k数据集上对该模型进行了测试，可以看出通过双感知机制获得的结果是非常有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Discrete Mathematical Sciences and Cryptography

自引率

0.00%

发文量