{"title":"利用机器翻译为资源不足的语言提供资源——图像字幕任务","authors":"Basem H. A. Ahmed, Motaz K. Saad","doi":"10.1109/PICICT53635.2021.00017","DOIUrl":null,"url":null,"abstract":"Image captioning is an NLP task that has many applications such as image search and retrieval. This Task is a challenging task, and it needs a lot of data (image data and their text captions), which might not be available for some languages. In this work, we investigate the use of a machine translation system to provide resources for a low-resourced language (Arabic) for the imaging captioning task. We train a model on captions automatically translated using Google machine translation service. The performance is measured using the BLEU, ROUGE, CIDEr, METEOR metrics. We compare to English model's performance. We also evaluate the generated captions on manually translated captions. The results show that machine translation can be good enough for creating resources for low-resourced languages for the image captioning task and translating training data and building a new model is better than translating the model's output.","PeriodicalId":308869,"journal":{"name":"2021 Palestinian International Conference on Information and Communication Technology (PICICT)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Use of Machine Translation to Provide Resources for Under-Resourced Languages - Image Captioning Task\",\"authors\":\"Basem H. A. Ahmed, Motaz K. Saad\",\"doi\":\"10.1109/PICICT53635.2021.00017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image captioning is an NLP task that has many applications such as image search and retrieval. This Task is a challenging task, and it needs a lot of data (image data and their text captions), which might not be available for some languages. In this work, we investigate the use of a machine translation system to provide resources for a low-resourced language (Arabic) for the imaging captioning task. We train a model on captions automatically translated using Google machine translation service. The performance is measured using the BLEU, ROUGE, CIDEr, METEOR metrics. We compare to English model's performance. We also evaluate the generated captions on manually translated captions. The results show that machine translation can be good enough for creating resources for low-resourced languages for the image captioning task and translating training data and building a new model is better than translating the model's output.\",\"PeriodicalId\":308869,\"journal\":{\"name\":\"2021 Palestinian International Conference on Information and Communication Technology (PICICT)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Palestinian International Conference on Information and Communication Technology (PICICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PICICT53635.2021.00017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Palestinian International Conference on Information and Communication Technology (PICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICICT53635.2021.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Use of Machine Translation to Provide Resources for Under-Resourced Languages - Image Captioning Task
Image captioning is an NLP task that has many applications such as image search and retrieval. This Task is a challenging task, and it needs a lot of data (image data and their text captions), which might not be available for some languages. In this work, we investigate the use of a machine translation system to provide resources for a low-resourced language (Arabic) for the imaging captioning task. We train a model on captions automatically translated using Google machine translation service. The performance is measured using the BLEU, ROUGE, CIDEr, METEOR metrics. We compare to English model's performance. We also evaluate the generated captions on manually translated captions. The results show that machine translation can be good enough for creating resources for low-resourced languages for the image captioning task and translating training data and building a new model is better than translating the model's output.