Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation

International journal of artificial intelligence & applications Pub Date : 2020-01-30 DOI:10.5121/ijaia.2020.11107

Ibrahim Gashaw, H. Shashirekha

{"title":"Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation","authors":"Ibrahim Gashaw, H. Shashirekha","doi":"10.5121/ijaia.2020.11107","DOIUrl":null,"url":null,"abstract":"Many automatic translation works have been addressed between major European language pairs, by\n taking advantage of large scale parallel corpora, but very few research works are conducted on the\n Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel\n Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic\n text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation\n of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using\n Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system.\n LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM\n based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score\n of 12%, 11%, and 6% respectively.","PeriodicalId":93188,"journal":{"name":"International journal of artificial intelligence & applications","volume":"11 1","pages":"79-91"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5121/ijaia.2020.11107","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of artificial intelligence & applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/ijaia.2020.11107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.

查看原文本刊更多论文

神经机器翻译用阿姆哈拉语-阿拉伯语并行文本语料库的构建

利用大规模的平行语料库，许多欧洲主要语言对之间的自动翻译工作已经得到了解决，但由于阿姆哈拉语-阿拉伯语对的平行数据稀缺，很少对其进行研究。然而，没有可用于机器翻译任务的基准平行阿姆哈拉语-阿拉伯语文本语料库。因此，通过修改Tanzile上现有的单语阿拉伯语文本及其阿姆哈拉语文本语料库的等效翻译，构建了一个小型的平行古兰经文本语料库。在基于两个长短期存储器（LSTM）和门控递归单元（GRU）的神经机器翻译（NMT）上使用基于注意力的编码器-编码器架构进行了实验，该架构改编自开源的OpenNMT系统。将基于LSTM和GRU的NMT模型与谷歌翻译系统进行比较，发现基于LSTM的OpenNMT优于基于GRU的OpenNMT和谷歌翻译系统，BLEU得分分别为12%、11%和6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of artificial intelligence & applications

自引率

0.00%

发文量