Doanh C. Bui, Truc Trinh, Nguyen D. Vo, Khang Nguyen
{"title":"基于文本的图像字幕的增强嵌入空间方法","authors":"Doanh C. Bui, Truc Trinh, Nguyen D. Vo, Khang Nguyen","doi":"10.1109/NICS54270.2021.9701576","DOIUrl":null,"url":null,"abstract":"Scene text-based Image Captioning is the problem that generates caption for an input image using both contexts of image and scene text information. To improve the performance of this problem, in this paper, we propose two modules, Objects-augmented and Grid features augmentation, to enhance spatial location information and global information understanding in images based on M4C-Captioner architecture for text-based Image Captioning problems. Experimental results on the TextCaps dataset show that our method achieves superior performance compared with the M4C-Captioner baseline approach. Our highest result on the Standard Test set is 20.02% and 85.64% in the two metrics BLEU4 and CIDEr, respectively.","PeriodicalId":296963,"journal":{"name":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Augmented Embedding Spaces approach for Text-based Image Captioning\",\"authors\":\"Doanh C. Bui, Truc Trinh, Nguyen D. Vo, Khang Nguyen\",\"doi\":\"10.1109/NICS54270.2021.9701576\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene text-based Image Captioning is the problem that generates caption for an input image using both contexts of image and scene text information. To improve the performance of this problem, in this paper, we propose two modules, Objects-augmented and Grid features augmentation, to enhance spatial location information and global information understanding in images based on M4C-Captioner architecture for text-based Image Captioning problems. Experimental results on the TextCaps dataset show that our method achieves superior performance compared with the M4C-Captioner baseline approach. Our highest result on the Standard Test set is 20.02% and 85.64% in the two metrics BLEU4 and CIDEr, respectively.\",\"PeriodicalId\":296963,\"journal\":{\"name\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS54270.2021.9701576\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS54270.2021.9701576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Augmented Embedding Spaces approach for Text-based Image Captioning
Scene text-based Image Captioning is the problem that generates caption for an input image using both contexts of image and scene text information. To improve the performance of this problem, in this paper, we propose two modules, Objects-augmented and Grid features augmentation, to enhance spatial location information and global information understanding in images based on M4C-Captioner architecture for text-based Image Captioning problems. Experimental results on the TextCaps dataset show that our method achieves superior performance compared with the M4C-Captioner baseline approach. Our highest result on the Standard Test set is 20.02% and 85.64% in the two metrics BLEU4 and CIDEr, respectively.