A. Tonja, Michael Melese Woldeyohannis, Mesay Gemeda Yigezu
{"title":"A Parallel Corpora for bi-directional Neural Machine Translation for Low Resourced Ethiopian Languages","authors":"A. Tonja, Michael Melese Woldeyohannis, Mesay Gemeda Yigezu","doi":"10.1109/ict4da53266.2021.9672230","DOIUrl":null,"url":null,"abstract":"In this paper, we described an effort towards the development of parallel corpora for English and Ethiopian Languages, such as Wolaita, Gamo, Gofa, and Dawuro neural machine translation. The corpus is collected from the religious domain and to check the usability of the collected parallel corpora a bi-directional Neural Machine Translation experiments were conducted. The neural machine translation shows good results as a baseline experiment of BLEU score of 13.8 in Wolaita-English and 8.2 English-Wolaita machine translation. The Wolaita-English translation shows a better result than the other pairs of Ethiopian languages and the result of neural machine translation performs well when the amount of dataset increases, thus the amount of dataset has a great impact on the performance. Besides these, the morphological richness of Ethiopian language contributed to the low performance of neural machine translation when the Ethiopian language is used as the target language. Further, we are working on minimizing the effect of morphological richness through different morphological processing techniques in the translation of Ethiopian languages.","PeriodicalId":371663,"journal":{"name":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ict4da53266.2021.9672230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
In this paper, we described an effort towards the development of parallel corpora for English and Ethiopian Languages, such as Wolaita, Gamo, Gofa, and Dawuro neural machine translation. The corpus is collected from the religious domain and to check the usability of the collected parallel corpora a bi-directional Neural Machine Translation experiments were conducted. The neural machine translation shows good results as a baseline experiment of BLEU score of 13.8 in Wolaita-English and 8.2 English-Wolaita machine translation. The Wolaita-English translation shows a better result than the other pairs of Ethiopian languages and the result of neural machine translation performs well when the amount of dataset increases, thus the amount of dataset has a great impact on the performance. Besides these, the morphological richness of Ethiopian language contributed to the low performance of neural machine translation when the Ethiopian language is used as the target language. Further, we are working on minimizing the effect of morphological richness through different morphological processing techniques in the translation of Ethiopian languages.