Mesay Gemeda Yigezu, Michael Melese Woldeyohannis, A. Tonja
{"title":"Multilingual Neural Machine Translation for Low Resourced Languages: Ometo-English","authors":"Mesay Gemeda Yigezu, Michael Melese Woldeyohannis, A. Tonja","doi":"10.1109/ict4da53266.2021.9671270","DOIUrl":null,"url":null,"abstract":"Unlike technologically favored languages, under-resourced languages highly suffer from the lack of language resources for machine translation. In this paper, we present a new approach to overcome the problem of language resources that share significant amount of linguistic resource in the Ometo language family. The dataset for the experiment are collected from religious domain consisting of four Ometo (Wolaita, Gamo, Gofa, and Dawuro) languages paired with English sentence and automatically extracted from the web. The collected corpus were used for conducting neural machine translation experiments from Ometo to English. Among the experiments, the Wolaita, Dawuro and Gamo paired with English sentence combination for training and Gofa for testing gives highest BLEU score 4.5 than the other combinations while the Wolaita, Gamo, and Gofa with Dawuro testing provide the lowest result. The BLEU score of the machine translation system shows a promising result despite the language differences, the morphological richness, and complexity of the Ometo languages which has high impact on the performance of the Ometo-English machine translation. Further, we are now working towards developing a translation system that significantly reduces the effect of morphological richness and complexity of the Ometo languages through different linguistic processing.","PeriodicalId":371663,"journal":{"name":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ict4da53266.2021.9671270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Unlike technologically favored languages, under-resourced languages highly suffer from the lack of language resources for machine translation. In this paper, we present a new approach to overcome the problem of language resources that share significant amount of linguistic resource in the Ometo language family. The dataset for the experiment are collected from religious domain consisting of four Ometo (Wolaita, Gamo, Gofa, and Dawuro) languages paired with English sentence and automatically extracted from the web. The collected corpus were used for conducting neural machine translation experiments from Ometo to English. Among the experiments, the Wolaita, Dawuro and Gamo paired with English sentence combination for training and Gofa for testing gives highest BLEU score 4.5 than the other combinations while the Wolaita, Gamo, and Gofa with Dawuro testing provide the lowest result. The BLEU score of the machine translation system shows a promising result despite the language differences, the morphological richness, and complexity of the Ometo languages which has high impact on the performance of the Ometo-English machine translation. Further, we are now working towards developing a translation system that significantly reduces the effect of morphological richness and complexity of the Ometo languages through different linguistic processing.