低资源语言的多语言神经机器翻译:Ometo-English

Mesay Gemeda Yigezu, Michael Melese Woldeyohannis, A. Tonja
{"title":"低资源语言的多语言神经机器翻译:Ometo-English","authors":"Mesay Gemeda Yigezu, Michael Melese Woldeyohannis, A. Tonja","doi":"10.1109/ict4da53266.2021.9671270","DOIUrl":null,"url":null,"abstract":"Unlike technologically favored languages, under-resourced languages highly suffer from the lack of language resources for machine translation. In this paper, we present a new approach to overcome the problem of language resources that share significant amount of linguistic resource in the Ometo language family. The dataset for the experiment are collected from religious domain consisting of four Ometo (Wolaita, Gamo, Gofa, and Dawuro) languages paired with English sentence and automatically extracted from the web. The collected corpus were used for conducting neural machine translation experiments from Ometo to English. Among the experiments, the Wolaita, Dawuro and Gamo paired with English sentence combination for training and Gofa for testing gives highest BLEU score 4.5 than the other combinations while the Wolaita, Gamo, and Gofa with Dawuro testing provide the lowest result. The BLEU score of the machine translation system shows a promising result despite the language differences, the morphological richness, and complexity of the Ometo languages which has high impact on the performance of the Ometo-English machine translation. Further, we are now working towards developing a translation system that significantly reduces the effect of morphological richness and complexity of the Ometo languages through different linguistic processing.","PeriodicalId":371663,"journal":{"name":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Multilingual Neural Machine Translation for Low Resourced Languages: Ometo-English\",\"authors\":\"Mesay Gemeda Yigezu, Michael Melese Woldeyohannis, A. Tonja\",\"doi\":\"10.1109/ict4da53266.2021.9671270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unlike technologically favored languages, under-resourced languages highly suffer from the lack of language resources for machine translation. In this paper, we present a new approach to overcome the problem of language resources that share significant amount of linguistic resource in the Ometo language family. The dataset for the experiment are collected from religious domain consisting of four Ometo (Wolaita, Gamo, Gofa, and Dawuro) languages paired with English sentence and automatically extracted from the web. The collected corpus were used for conducting neural machine translation experiments from Ometo to English. Among the experiments, the Wolaita, Dawuro and Gamo paired with English sentence combination for training and Gofa for testing gives highest BLEU score 4.5 than the other combinations while the Wolaita, Gamo, and Gofa with Dawuro testing provide the lowest result. The BLEU score of the machine translation system shows a promising result despite the language differences, the morphological richness, and complexity of the Ometo languages which has high impact on the performance of the Ometo-English machine translation. Further, we are now working towards developing a translation system that significantly reduces the effect of morphological richness and complexity of the Ometo languages through different linguistic processing.\",\"PeriodicalId\":371663,\"journal\":{\"name\":\"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ict4da53266.2021.9671270\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ict4da53266.2021.9671270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

与技术上受欢迎的语言不同,资源不足的语言严重缺乏用于机器翻译的语言资源。在本文中,我们提出了一种新的方法来克服在Ometo语族中共享大量语言资源的语言资源问题。实验数据集由四种欧米图语(沃莱塔语、加莫语、戈法语和达乌罗语)组成的宗教域与英语句子配对,并自动从网络中提取。将收集到的语料库用于从Ometo到English的神经机器翻译实验。其中,Wolaita、Dawuro和Gamo与英语句子组合进行训练和Gofa进行测试的组合BLEU得分最高,为4.5分,Wolaita、Gamo和Gofa与Dawuro测试的组合BLEU得分最低。机器翻译系统的BLEU分数显示了良好的结果,尽管语言差异、形态丰富度和复杂性对机器翻译的性能有很大影响。此外,我们现在正在努力开发一个翻译系统,通过不同的语言处理,显著降低欧梅托语的形态学丰富度和复杂性的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multilingual Neural Machine Translation for Low Resourced Languages: Ometo-English
Unlike technologically favored languages, under-resourced languages highly suffer from the lack of language resources for machine translation. In this paper, we present a new approach to overcome the problem of language resources that share significant amount of linguistic resource in the Ometo language family. The dataset for the experiment are collected from religious domain consisting of four Ometo (Wolaita, Gamo, Gofa, and Dawuro) languages paired with English sentence and automatically extracted from the web. The collected corpus were used for conducting neural machine translation experiments from Ometo to English. Among the experiments, the Wolaita, Dawuro and Gamo paired with English sentence combination for training and Gofa for testing gives highest BLEU score 4.5 than the other combinations while the Wolaita, Gamo, and Gofa with Dawuro testing provide the lowest result. The BLEU score of the machine translation system shows a promising result despite the language differences, the morphological richness, and complexity of the Ometo languages which has high impact on the performance of the Ometo-English machine translation. Further, we are now working towards developing a translation system that significantly reduces the effect of morphological richness and complexity of the Ometo languages through different linguistic processing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信