Zar Zar Hlaing, Ye Kyaw Thu, T. Supnithi, P. Netisopakul
{"title":"利用免费在线机器翻译服务的语料库扩展提高SMT和NMT的性能","authors":"Zar Zar Hlaing, Ye Kyaw Thu, T. Supnithi, P. Netisopakul","doi":"10.1109/ICAIT51105.2020.9261772","DOIUrl":null,"url":null,"abstract":"In machine translation, parallel corpora of source-target language pair are essential to improve the performance of the translation. However, the existing parallel corpora for the low resource language is not sufficient to improve the quality of the translation. In this paper, we explore the role of corpus extension by using the three freely available online machine translation services; “Google Translate”, “SYSTRAN Translate” and “Yandex Translate” for English and Thai language pair. We compare three statistical and neural machine translation performances between the original ASEAN-MT corpus, and their extended version, which double the original size of the ASEAN-MT. The results showed that, for SMT models, extended Thai corpus can help improve the translation performance for th-en translation up to 2.6% and the extended English corpus can do so significantly for en-th translation up to 4.2%. While for the NMT model, the extended Thai corpus can improve the translation performance up to 5.5%.","PeriodicalId":173291,"journal":{"name":"2020 International Conference on Advanced Information Technologies (ICAIT)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Increasing SMT and NMT Performance by Corpus Extension with Free Online Machine Translation Services\",\"authors\":\"Zar Zar Hlaing, Ye Kyaw Thu, T. Supnithi, P. Netisopakul\",\"doi\":\"10.1109/ICAIT51105.2020.9261772\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In machine translation, parallel corpora of source-target language pair are essential to improve the performance of the translation. However, the existing parallel corpora for the low resource language is not sufficient to improve the quality of the translation. In this paper, we explore the role of corpus extension by using the three freely available online machine translation services; “Google Translate”, “SYSTRAN Translate” and “Yandex Translate” for English and Thai language pair. We compare three statistical and neural machine translation performances between the original ASEAN-MT corpus, and their extended version, which double the original size of the ASEAN-MT. The results showed that, for SMT models, extended Thai corpus can help improve the translation performance for th-en translation up to 2.6% and the extended English corpus can do so significantly for en-th translation up to 4.2%. While for the NMT model, the extended Thai corpus can improve the translation performance up to 5.5%.\",\"PeriodicalId\":173291,\"journal\":{\"name\":\"2020 International Conference on Advanced Information Technologies (ICAIT)\",\"volume\":\"2014 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Advanced Information Technologies (ICAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIT51105.2020.9261772\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Advanced Information Technologies (ICAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIT51105.2020.9261772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Increasing SMT and NMT Performance by Corpus Extension with Free Online Machine Translation Services
In machine translation, parallel corpora of source-target language pair are essential to improve the performance of the translation. However, the existing parallel corpora for the low resource language is not sufficient to improve the quality of the translation. In this paper, we explore the role of corpus extension by using the three freely available online machine translation services; “Google Translate”, “SYSTRAN Translate” and “Yandex Translate” for English and Thai language pair. We compare three statistical and neural machine translation performances between the original ASEAN-MT corpus, and their extended version, which double the original size of the ASEAN-MT. The results showed that, for SMT models, extended Thai corpus can help improve the translation performance for th-en translation up to 2.6% and the extended English corpus can do so significantly for en-th translation up to 4.2%. While for the NMT model, the extended Thai corpus can improve the translation performance up to 5.5%.