图像字幕的语义增强方法

International Conference on Artificial Intelligence, Virtual Reality, and Visualization Pub Date : 2023-03-01 DOI:10.1117/12.2667270

Luming Cui, Lin Li

{"title":"图像字幕的语义增强方法","authors":"Luming Cui, Lin Li","doi":"10.1117/12.2667270","DOIUrl":null,"url":null,"abstract":"Image captioning, a cross-modal study, aims to generating a description for a given image, which plays an important role in many fields like image retrieval and computer-assisted instruction. Currently, the challenge in image captioning is the limited quality of generated descriptions including insufficient utilization of image feature information and the limited language learning ability of the decoder. In this paper, we address the above problems by constructing a semantic enhancement module and a multi-round decoding mechanism to enhance the decoding ability of the model, which uses the Transformer model as the primary structure. To validate the efficacy of the model, we conducted intensive experiments on the MSCOCO2014 benchmark and evaluated its performance using five evaluation metrics. The experimental results show that the proposed method in this paper has improved to varying degrees on all five-evaluation metrics.","PeriodicalId":137914,"journal":{"name":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic enhancement methods for image captioning\",\"authors\":\"Luming Cui, Lin Li\",\"doi\":\"10.1117/12.2667270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image captioning, a cross-modal study, aims to generating a description for a given image, which plays an important role in many fields like image retrieval and computer-assisted instruction. Currently, the challenge in image captioning is the limited quality of generated descriptions including insufficient utilization of image feature information and the limited language learning ability of the decoder. In this paper, we address the above problems by constructing a semantic enhancement module and a multi-round decoding mechanism to enhance the decoding ability of the model, which uses the Transformer model as the primary structure. To validate the efficacy of the model, we conducted intensive experiments on the MSCOCO2014 benchmark and evaluated its performance using five evaluation metrics. The experimental results show that the proposed method in this paper has improved to varying degrees on all five-evaluation metrics.\",\"PeriodicalId\":137914,\"journal\":{\"name\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667270\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

图像字幕是一种跨模态的研究，其目的是为给定的图像生成描述，在图像检索和计算机辅助教学等许多领域发挥着重要作用。目前，图像字幕所面临的挑战是生成的描述质量有限，包括图像特征信息的利用不足以及解码器的语言学习能力有限。本文以Transformer模型为主要结构，通过构建语义增强模块和多轮解码机制来提高模型的解码能力，从而解决了上述问题。为了验证模型的有效性，我们在MSCOCO2014基准上进行了大量实验，并使用五个评价指标对其性能进行了评估。实验结果表明，本文提出的方法在五个评价指标上都有不同程度的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic enhancement methods for image captioning

Image captioning, a cross-modal study, aims to generating a description for a given image, which plays an important role in many fields like image retrieval and computer-assisted instruction. Currently, the challenge in image captioning is the limited quality of generated descriptions including insufficient utilization of image feature information and the limited language learning ability of the decoder. In this paper, we address the above problems by constructing a semantic enhancement module and a multi-round decoding mechanism to enhance the decoding ability of the model, which uses the Transformer model as the primary structure. To validate the efficacy of the model, we conducted intensive experiments on the MSCOCO2014 benchmark and evaluated its performance using five evaluation metrics. The experimental results show that the proposed method in this paper has improved to varying degrees on all five-evaluation metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Artificial Intelligence, Virtual Reality, and Visualization

自引率

0.00%

发文量