Improving Diversity of Image Captioning Through Variational Autoencoders and Adversarial Learning

Li Ren, Guo-Jun Qi, K. Hua
{"title":"Improving Diversity of Image Captioning Through Variational Autoencoders and Adversarial Learning","authors":"Li Ren, Guo-Jun Qi, K. Hua","doi":"10.1109/WACV.2019.00034","DOIUrl":null,"url":null,"abstract":"Learning translation from images to human-readable natural language has become a great challenge in computer vision research in recent years. Existing works explore the semantic correlation between the visual and language domains via encoder-to-decoder learning frameworks based on classifying visual features in the language domain. This approach, however, is criticized for its lacking of naturalness and diversity. In this paper, we demonstrate a novel way to learn a semantic connection between visual information and natural language directly based on a Variational Autoencoder (VAE) that is trained in an adversarial routine. Instead of using the classification based discriminator, our method directly learns to estimate the diversity between a hidden vector embedded from a text encoder and an informative feature that is sampled from a learned distribution of the autoencoders. We show that the sentences learned from this matching contains accurate semantic meaning with high diversity in the image captioning task. Our experiments on the popular MSCOCO dataset indicates that our method learns to generate high-quality natural language with competitive scores on both correctness and diversity.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2019.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Learning translation from images to human-readable natural language has become a great challenge in computer vision research in recent years. Existing works explore the semantic correlation between the visual and language domains via encoder-to-decoder learning frameworks based on classifying visual features in the language domain. This approach, however, is criticized for its lacking of naturalness and diversity. In this paper, we demonstrate a novel way to learn a semantic connection between visual information and natural language directly based on a Variational Autoencoder (VAE) that is trained in an adversarial routine. Instead of using the classification based discriminator, our method directly learns to estimate the diversity between a hidden vector embedded from a text encoder and an informative feature that is sampled from a learned distribution of the autoencoders. We show that the sentences learned from this matching contains accurate semantic meaning with high diversity in the image captioning task. Our experiments on the popular MSCOCO dataset indicates that our method learns to generate high-quality natural language with competitive scores on both correctness and diversity.
通过变分自编码器和对抗学习提高图像标题的多样性
学习从图像到人类可读的自然语言的翻译是近年来计算机视觉研究的一大挑战。现有的研究通过基于语言领域视觉特征分类的编码器到解码器学习框架来探索视觉和语言领域之间的语义相关性。然而,这种方法因缺乏自然性和多样性而受到批评。在本文中,我们展示了一种新的方法来直接学习视觉信息和自然语言之间的语义连接,该方法基于在对抗例程中训练的变分自编码器(VAE)。我们的方法不是使用基于分类的鉴别器,而是直接学习估计从文本编码器嵌入的隐藏向量与从自编码器的学习分布中采样的信息特征之间的多样性。在图像字幕任务中,我们证明了从这种匹配中学习到的句子包含了准确的语义和高度的多样性。我们在流行的MSCOCO数据集上的实验表明,我们的方法可以学习生成高质量的自然语言,在正确性和多样性方面都具有竞争力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信