基于G-AoANet的图像描述生成方法研究

Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition Pub Date : 2022-09-23 DOI:10.1145/3573942.3574072

Pi Qiao, Ruixue Shen, Yuan Li

{"title":"基于G-AoANet的图像描述生成方法研究","authors":"Pi Qiao, Ruixue Shen, Yuan Li","doi":"10.1145/3573942.3574072","DOIUrl":null,"url":null,"abstract":"Most of the image description generation methods in the attention-based encoder-decoder framework extract local features from images. Despite the relatively high semantic level of local features, it still has two problems to be solved, one is object loss, where some important objects may be lost when generating image descriptions, and the other is prediction error, as an object may be identified in the wrong class. In this paper, a G-AoANet model is proposed to solve the above problems. The model uses an attention mechanism to combine global features with local features. In this way, our model can selectively focus on both object and contextual information, improving the quality of the generated descriptions. Experimental results show that the model improves the initially reported best CIDEr-D and SPICE scores on the MS COCO dataset by 9.3% and 5.1% respectively.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Image Description Generation Method Based on G-AoANet\",\"authors\":\"Pi Qiao, Ruixue Shen, Yuan Li\",\"doi\":\"10.1145/3573942.3574072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of the image description generation methods in the attention-based encoder-decoder framework extract local features from images. Despite the relatively high semantic level of local features, it still has two problems to be solved, one is object loss, where some important objects may be lost when generating image descriptions, and the other is prediction error, as an object may be identified in the wrong class. In this paper, a G-AoANet model is proposed to solve the above problems. The model uses an attention mechanism to combine global features with local features. In this way, our model can selectively focus on both object and contextual information, improving the quality of the generated descriptions. Experimental results show that the model improves the initially reported best CIDEr-D and SPICE scores on the MS COCO dataset by 9.3% and 5.1% respectively.\",\"PeriodicalId\":103293,\"journal\":{\"name\":\"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3573942.3574072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3573942.3574072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在基于注意力的编码器-解码器框架中，大多数图像描述生成方法都是从图像中提取局部特征。尽管局部特征的语义水平相对较高，但仍然存在两个问题需要解决，一个是对象丢失，在生成图像描述时可能会丢失一些重要的对象，另一个是预测误差，可能会将对象识别在错误的类中。本文提出了一种G-AoANet模型来解决上述问题。该模型利用注意机制将全局特征与局部特征结合起来。通过这种方式，我们的模型可以选择性地关注对象和上下文信息，从而提高生成描述的质量。实验结果表明，该模型在MS COCO数据集上的CIDEr-D和SPICE得分分别提高了9.3%和5.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on Image Description Generation Method Based on G-AoANet

Most of the image description generation methods in the attention-based encoder-decoder framework extract local features from images. Despite the relatively high semantic level of local features, it still has two problems to be solved, one is object loss, where some important objects may be lost when generating image descriptions, and the other is prediction error, as an object may be identified in the wrong class. In this paper, a G-AoANet model is proposed to solve the above problems. The model uses an attention mechanism to combine global features with local features. In this way, our model can selectively focus on both object and contextual information, improving the quality of the generated descriptions. Experimental results show that the model improves the initially reported best CIDEr-D and SPICE scores on the MS COCO dataset by 9.3% and 5.1% respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

自引率

0.00%

发文量