{"title":"图像字幕的语义增强方法","authors":"Luming Cui, Lin Li","doi":"10.1117/12.2667270","DOIUrl":null,"url":null,"abstract":"Image captioning, a cross-modal study, aims to generating a description for a given image, which plays an important role in many fields like image retrieval and computer-assisted instruction. Currently, the challenge in image captioning is the limited quality of generated descriptions including insufficient utilization of image feature information and the limited language learning ability of the decoder. In this paper, we address the above problems by constructing a semantic enhancement module and a multi-round decoding mechanism to enhance the decoding ability of the model, which uses the Transformer model as the primary structure. To validate the efficacy of the model, we conducted intensive experiments on the MSCOCO2014 benchmark and evaluated its performance using five evaluation metrics. The experimental results show that the proposed method in this paper has improved to varying degrees on all five-evaluation metrics.","PeriodicalId":137914,"journal":{"name":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic enhancement methods for image captioning\",\"authors\":\"Luming Cui, Lin Li\",\"doi\":\"10.1117/12.2667270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image captioning, a cross-modal study, aims to generating a description for a given image, which plays an important role in many fields like image retrieval and computer-assisted instruction. Currently, the challenge in image captioning is the limited quality of generated descriptions including insufficient utilization of image feature information and the limited language learning ability of the decoder. In this paper, we address the above problems by constructing a semantic enhancement module and a multi-round decoding mechanism to enhance the decoding ability of the model, which uses the Transformer model as the primary structure. To validate the efficacy of the model, we conducted intensive experiments on the MSCOCO2014 benchmark and evaluated its performance using five evaluation metrics. The experimental results show that the proposed method in this paper has improved to varying degrees on all five-evaluation metrics.\",\"PeriodicalId\":137914,\"journal\":{\"name\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667270\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Image captioning, a cross-modal study, aims to generating a description for a given image, which plays an important role in many fields like image retrieval and computer-assisted instruction. Currently, the challenge in image captioning is the limited quality of generated descriptions including insufficient utilization of image feature information and the limited language learning ability of the decoder. In this paper, we address the above problems by constructing a semantic enhancement module and a multi-round decoding mechanism to enhance the decoding ability of the model, which uses the Transformer model as the primary structure. To validate the efficacy of the model, we conducted intensive experiments on the MSCOCO2014 benchmark and evaluated its performance using five evaluation metrics. The experimental results show that the proposed method in this paper has improved to varying degrees on all five-evaluation metrics.