利用免费在线机器翻译服务的语料库扩展提高SMT和NMT的性能

Zar Zar Hlaing, Ye Kyaw Thu, T. Supnithi, P. Netisopakul
{"title":"利用免费在线机器翻译服务的语料库扩展提高SMT和NMT的性能","authors":"Zar Zar Hlaing, Ye Kyaw Thu, T. Supnithi, P. Netisopakul","doi":"10.1109/ICAIT51105.2020.9261772","DOIUrl":null,"url":null,"abstract":"In machine translation, parallel corpora of source-target language pair are essential to improve the performance of the translation. However, the existing parallel corpora for the low resource language is not sufficient to improve the quality of the translation. In this paper, we explore the role of corpus extension by using the three freely available online machine translation services; “Google Translate”, “SYSTRAN Translate” and “Yandex Translate” for English and Thai language pair. We compare three statistical and neural machine translation performances between the original ASEAN-MT corpus, and their extended version, which double the original size of the ASEAN-MT. The results showed that, for SMT models, extended Thai corpus can help improve the translation performance for th-en translation up to 2.6% and the extended English corpus can do so significantly for en-th translation up to 4.2%. While for the NMT model, the extended Thai corpus can improve the translation performance up to 5.5%.","PeriodicalId":173291,"journal":{"name":"2020 International Conference on Advanced Information Technologies (ICAIT)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Increasing SMT and NMT Performance by Corpus Extension with Free Online Machine Translation Services\",\"authors\":\"Zar Zar Hlaing, Ye Kyaw Thu, T. Supnithi, P. Netisopakul\",\"doi\":\"10.1109/ICAIT51105.2020.9261772\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In machine translation, parallel corpora of source-target language pair are essential to improve the performance of the translation. However, the existing parallel corpora for the low resource language is not sufficient to improve the quality of the translation. In this paper, we explore the role of corpus extension by using the three freely available online machine translation services; “Google Translate”, “SYSTRAN Translate” and “Yandex Translate” for English and Thai language pair. We compare three statistical and neural machine translation performances between the original ASEAN-MT corpus, and their extended version, which double the original size of the ASEAN-MT. The results showed that, for SMT models, extended Thai corpus can help improve the translation performance for th-en translation up to 2.6% and the extended English corpus can do so significantly for en-th translation up to 4.2%. While for the NMT model, the extended Thai corpus can improve the translation performance up to 5.5%.\",\"PeriodicalId\":173291,\"journal\":{\"name\":\"2020 International Conference on Advanced Information Technologies (ICAIT)\",\"volume\":\"2014 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Advanced Information Technologies (ICAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIT51105.2020.9261772\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Advanced Information Technologies (ICAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIT51105.2020.9261772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在机器翻译中,源译语言对的平行语料库对提高翻译效果至关重要。然而,对于低资源语言,现有的平行语料库不足以提高翻译质量。在本文中,我们通过使用三个免费的在线机器翻译服务来探索语料库扩展的作用;“谷歌翻译”,“SYSTRAN翻译”和“Yandex翻译”为英语和泰语对。我们比较了原始的ASEAN-MT语料库和扩展的ASEAN-MT语料库之间的三种统计和神经机器翻译性能。结果表明,对于SMT模型,扩展泰语语料库对第十次翻译的翻译性能提高了2.6%,扩展英语语料库对第十次翻译的翻译性能提高了4.2%。而对于NMT模型,扩展的泰语语料库可以将翻译性能提高5.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Increasing SMT and NMT Performance by Corpus Extension with Free Online Machine Translation Services
In machine translation, parallel corpora of source-target language pair are essential to improve the performance of the translation. However, the existing parallel corpora for the low resource language is not sufficient to improve the quality of the translation. In this paper, we explore the role of corpus extension by using the three freely available online machine translation services; “Google Translate”, “SYSTRAN Translate” and “Yandex Translate” for English and Thai language pair. We compare three statistical and neural machine translation performances between the original ASEAN-MT corpus, and their extended version, which double the original size of the ASEAN-MT. The results showed that, for SMT models, extended Thai corpus can help improve the translation performance for th-en translation up to 2.6% and the extended English corpus can do so significantly for en-th translation up to 4.2%. While for the NMT model, the extended Thai corpus can improve the translation performance up to 5.5%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信