Social Image Annotation Based on Image Captioning

Haiyu Yang, Haiyu Song, Wei Li, Kexin Qin, Haoyue Shi, Qi Jiao
{"title":"Social Image Annotation Based on Image Captioning","authors":"Haiyu Yang, Haiyu Song, Wei Li, Kexin Qin, Haoyue Shi, Qi Jiao","doi":"10.37394/232014.2022.18.15","DOIUrl":null,"url":null,"abstract":"With the popularity of new social media, automatic image annotation (AIA) has been an active research topic due to its great importance in image retrieval, understanding, and management. Despite their relative success, most of annotation models suffer from the low-level visual representation and semantic gap. To address the above shortcomings, we propose a novel annotation method utilizing textual feature generated by image captioning, in contrast to all previous methods that use visual feature as image feature. In our method, each image is regarded as a label-vector of k userprovided textual tags rather than a visual vector. We summarize our method as follows. First, the image visual features are extracted by combining the deep residual network and the object detection model, which are encoded and decoded by the mesh-connected Transformer network model. Then, the textual modal feature vector of the image is constructed by removing stop-words and retaining high-frequency tags. Finally, the textual feature vector of the image is applied to the propagation annotation model to generate a high-quality image annotation labels. Experimental results conducted on standard MS-COCO datasets demonstrate that the proposed method significantly outperforms existing classical models, mainly benefiting from the proposed textual feature generated by image captioning technology.","PeriodicalId":305800,"journal":{"name":"WSEAS TRANSACTIONS ON SIGNAL PROCESSING","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WSEAS TRANSACTIONS ON SIGNAL PROCESSING","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37394/232014.2022.18.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the popularity of new social media, automatic image annotation (AIA) has been an active research topic due to its great importance in image retrieval, understanding, and management. Despite their relative success, most of annotation models suffer from the low-level visual representation and semantic gap. To address the above shortcomings, we propose a novel annotation method utilizing textual feature generated by image captioning, in contrast to all previous methods that use visual feature as image feature. In our method, each image is regarded as a label-vector of k userprovided textual tags rather than a visual vector. We summarize our method as follows. First, the image visual features are extracted by combining the deep residual network and the object detection model, which are encoded and decoded by the mesh-connected Transformer network model. Then, the textual modal feature vector of the image is constructed by removing stop-words and retaining high-frequency tags. Finally, the textual feature vector of the image is applied to the propagation annotation model to generate a high-quality image annotation labels. Experimental results conducted on standard MS-COCO datasets demonstrate that the proposed method significantly outperforms existing classical models, mainly benefiting from the proposed textual feature generated by image captioning technology.
基于图像字幕的社会图像标注
随着新社交媒体的普及,自动图像标注(automatic image annotation, AIA)因其在图像检索、理解和管理方面的重要作用而成为一个活跃的研究课题。尽管它们相对成功,但大多数注释模型都存在低级的视觉表示和语义差距。为了解决上述缺点,我们提出了一种利用图像字幕生成的文本特征的新型标注方法,而不是以往所有使用视觉特征作为图像特征的方法。在我们的方法中,每个图像被视为k个用户提供的文本标签的标签向量,而不是视觉向量。我们把我们的方法总结如下。首先,将深度残差网络与目标检测模型相结合提取图像的视觉特征,利用网格连接的Transformer网络模型进行编码和解码;然后,通过去除停止词和保留高频标签来构建图像的文本模态特征向量;最后,将图像的文本特征向量应用到传播标注模型中,生成高质量的图像标注标签。在标准MS-COCO数据集上进行的实验结果表明,该方法显著优于现有的经典模型,主要得益于图像字幕技术生成的文本特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信