Adaptive Attention Generation for Indonesian Image Captioning

Made Raharja Surya Mahadi, A. Arifianto, Kurniawan Nur Ramadhani
{"title":"Adaptive Attention Generation for Indonesian Image Captioning","authors":"Made Raharja Surya Mahadi, A. Arifianto, Kurniawan Nur Ramadhani","doi":"10.1109/ICoICT49345.2020.9166244","DOIUrl":null,"url":null,"abstract":"Image captioning is one of the most widely discussed topic nowadays. However, most research in this area generate English caption while there are thousands of language exist around the world. With their language uniqueness, there’s a need of specific research to generate captions in those languages. Indonesia, as the largest Southeast Asian country, has its own language, which is Bahasa Indonesia. Bahasa Indonesia has been taught in various countries such as Vietnam, Australia, and Japan. In this research, we propose the attention-based image captioning model using ResNet101 as the encoder and LSTM with adaptive attention as the decoder for the Indonesian image captioning task. Adaptive attention used to decide when and at which region of the image should be attended to produce the next word. The model we used was trained with the MSCOCO and Flick30k datasets besides. Both datasets are translated manually into Bahasa by human and by using Google Translate. Our research resulted in 0.678, 0.512, 0.375, 0.274, and 0.990 for BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDEr scores respectively. Our model also produces a similar score for the English image captioning model, which means our model capable of being equivalent to English image captioning. We also propose a new metric score by conducting a survey. The results state that 76.8% of our model’s caption results are better than validation data that has been translated using Google Translate.","PeriodicalId":113108,"journal":{"name":"2020 8th International Conference on Information and Communication Technology (ICoICT)","volume":"12 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 8th International Conference on Information and Communication Technology (ICoICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoICT49345.2020.9166244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Image captioning is one of the most widely discussed topic nowadays. However, most research in this area generate English caption while there are thousands of language exist around the world. With their language uniqueness, there’s a need of specific research to generate captions in those languages. Indonesia, as the largest Southeast Asian country, has its own language, which is Bahasa Indonesia. Bahasa Indonesia has been taught in various countries such as Vietnam, Australia, and Japan. In this research, we propose the attention-based image captioning model using ResNet101 as the encoder and LSTM with adaptive attention as the decoder for the Indonesian image captioning task. Adaptive attention used to decide when and at which region of the image should be attended to produce the next word. The model we used was trained with the MSCOCO and Flick30k datasets besides. Both datasets are translated manually into Bahasa by human and by using Google Translate. Our research resulted in 0.678, 0.512, 0.375, 0.274, and 0.990 for BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDEr scores respectively. Our model also produces a similar score for the English image captioning model, which means our model capable of being equivalent to English image captioning. We also propose a new metric score by conducting a survey. The results state that 76.8% of our model’s caption results are better than validation data that has been translated using Google Translate.
印尼语图像字幕的自适应注意力生成
图像字幕是当今讨论最广泛的话题之一。然而,这一领域的大多数研究都是生成英文字幕,而世界上有数千种语言。由于这些语言的独特性,需要专门的研究来生成这些语言的字幕。印度尼西亚是东南亚最大的国家,有自己的语言,即印尼语。印尼语在越南、澳大利亚和日本等许多国家都有教授。在本研究中,我们提出了基于注意的图像字幕模型,使用ResNet101作为编码器,使用自适应注意的LSTM作为解码器,用于印尼语图像字幕任务。自适应注意力用于决定何时以及在图像的哪个区域应该注意产生下一个单词。我们使用的模型还使用了MSCOCO和Flick30k数据集进行训练。这两个数据集都由人工和谷歌Translate手动翻译成印尼语。我们的研究结果表明,BLEU-1、BLEU-2、BLEU-3、BLEU-4和CIDEr得分分别为0.678、0.512、0.375、0.274和0.990。我们的模型也为英语图像字幕模型产生了类似的分数,这意味着我们的模型能够等同于英语图像字幕。我们还通过调查提出了一个新的度量分数。结果表明,76.8%的模型标题结果优于使用谷歌Translate翻译的验证数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信