Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation

Ibrahim Gashaw, H. Shashirekha
{"title":"Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation","authors":"Ibrahim Gashaw, H. Shashirekha","doi":"10.5121/ijaia.2020.11107","DOIUrl":null,"url":null,"abstract":"Many automatic translation works have been addressed between major European language pairs, by\n taking advantage of large scale parallel corpora, but very few research works are conducted on the\n Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel\n Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic\n text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation\n of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using\n Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system.\n LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM\n based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score\n of 12%, 11%, and 6% respectively.","PeriodicalId":93188,"journal":{"name":"International journal of artificial intelligence & applications","volume":"11 1","pages":"79-91"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5121/ijaia.2020.11107","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of artificial intelligence & applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/ijaia.2020.11107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.
神经机器翻译用阿姆哈拉语-阿拉伯语并行文本语料库的构建
利用大规模的平行语料库,许多欧洲主要语言对之间的自动翻译工作已经得到了解决,但由于阿姆哈拉语-阿拉伯语对的平行数据稀缺,很少对其进行研究。然而,没有可用于机器翻译任务的基准平行阿姆哈拉语-阿拉伯语文本语料库。因此,通过修改Tanzile上现有的单语阿拉伯语文本及其阿姆哈拉语文本语料库的等效翻译,构建了一个小型的平行古兰经文本语料库。在基于两个长短期存储器(LSTM)和门控递归单元(GRU)的神经机器翻译(NMT)上使用基于注意力的编码器-编码器架构进行了实验,该架构改编自开源的OpenNMT系统。将基于LSTM和GRU的NMT模型与谷歌翻译系统进行比较,发现基于LSTM的OpenNMT优于基于GRU的OpenNMT和谷歌翻译系统,BLEU得分分别为12%、11%和6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信