缅甸语和掸邦语对的三种统计机器翻译方法研究

Nang Aeindray Kyaw, Ye Kyaw Thu, Hlaing Myat Nwe, Phyu Phyu Tar, N. Min, T. Supnithi
{"title":"缅甸语和掸邦语对的三种统计机器翻译方法研究","authors":"Nang Aeindray Kyaw, Ye Kyaw Thu, Hlaing Myat Nwe, Phyu Phyu Tar, N. Min, T. Supnithi","doi":"10.1109/iSAI-NLP51646.2020.9376832","DOIUrl":null,"url":null,"abstract":"Shan is said to be the second-largest ethnic group of Myanmar. The main motivation is to break down the communication barrier between Shan people and Myanmar people. This paper contributes to the first evaluation of the quality of machine translation between Myanmar (Burmese) and Shan (Tai Long). We also built a Myanmar-Shan parallel corpus (around 11K sentences) based on the Myanmar language of the ASEAN MT corpus. In this research, three different statistical machine translation approaches were used to carry out the experiment: phrase-based, hierarchical phrase-based, and the operation sequence model. Furthermore, two different segmentation schemes were studied, these were syllable segmentation and word segmentation. Translating with syllable segmentation achieved higher quality machine translation for both Myanmar and Shan languages. BLEU and RIBES scoring techniques are used to measure the performance of the machine translations. The operation sequence model gave the highest scores (41.85 BLEU and 0.88031 RIBES) for Shan to Myanmar syllable translation. For Myanmar to Shan syllable translation, hierarchical phrase-based machine translation gave the highest BLEU score of 34.72 and the operation sequence model gave the highest RIBES score of 0.87012. Our experimental results with syllable segmentation produced promising results even with low data resources and we expect this can be developed into a useful translation system as more data comes available in the future.","PeriodicalId":311014,"journal":{"name":"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Study of Three Statistical Machine Translation Methods for Myanmar (Burmese) and Shan (Tai Long) Language Pair\",\"authors\":\"Nang Aeindray Kyaw, Ye Kyaw Thu, Hlaing Myat Nwe, Phyu Phyu Tar, N. Min, T. Supnithi\",\"doi\":\"10.1109/iSAI-NLP51646.2020.9376832\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Shan is said to be the second-largest ethnic group of Myanmar. The main motivation is to break down the communication barrier between Shan people and Myanmar people. This paper contributes to the first evaluation of the quality of machine translation between Myanmar (Burmese) and Shan (Tai Long). We also built a Myanmar-Shan parallel corpus (around 11K sentences) based on the Myanmar language of the ASEAN MT corpus. In this research, three different statistical machine translation approaches were used to carry out the experiment: phrase-based, hierarchical phrase-based, and the operation sequence model. Furthermore, two different segmentation schemes were studied, these were syllable segmentation and word segmentation. Translating with syllable segmentation achieved higher quality machine translation for both Myanmar and Shan languages. BLEU and RIBES scoring techniques are used to measure the performance of the machine translations. The operation sequence model gave the highest scores (41.85 BLEU and 0.88031 RIBES) for Shan to Myanmar syllable translation. For Myanmar to Shan syllable translation, hierarchical phrase-based machine translation gave the highest BLEU score of 34.72 and the operation sequence model gave the highest RIBES score of 0.87012. Our experimental results with syllable segmentation produced promising results even with low data resources and we expect this can be developed into a useful translation system as more data comes available in the future.\",\"PeriodicalId\":311014,\"journal\":{\"name\":\"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iSAI-NLP51646.2020.9376832\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP51646.2020.9376832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

据说掸族是缅甸第二大民族。主要的动机是打破掸邦人和缅甸人之间的沟通障碍。本文首次对缅甸语与掸邦语机器翻译的质量进行了评价。我们还基于东盟MT语料库中的缅甸语构建了缅甸-掸语平行语料库(约11K个句子)。本研究采用基于短语、基于分层短语和操作顺序模型三种不同的统计机器翻译方法进行实验。此外,还研究了两种不同的分词方案:音节分词和分词。使用音节分割的翻译实现了缅甸语和掸邦语更高质量的机器翻译。使用BLEU和RIBES评分技术来衡量机器翻译的性能。操作顺序模型对掸邦语到缅甸语的音节翻译得分最高(41.85 BLEU和0.88031 RIBES)。对于缅语到掸邦语的音节翻译,基于层次短语的机器翻译的BLEU得分最高,为34.72,操作顺序模型的RIBES得分最高,为0.87012。我们的音节分词实验结果即使在数据资源较少的情况下也取得了令人满意的结果,我们期望随着未来数据的增加,这可以发展成为一个有用的翻译系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Study of Three Statistical Machine Translation Methods for Myanmar (Burmese) and Shan (Tai Long) Language Pair
Shan is said to be the second-largest ethnic group of Myanmar. The main motivation is to break down the communication barrier between Shan people and Myanmar people. This paper contributes to the first evaluation of the quality of machine translation between Myanmar (Burmese) and Shan (Tai Long). We also built a Myanmar-Shan parallel corpus (around 11K sentences) based on the Myanmar language of the ASEAN MT corpus. In this research, three different statistical machine translation approaches were used to carry out the experiment: phrase-based, hierarchical phrase-based, and the operation sequence model. Furthermore, two different segmentation schemes were studied, these were syllable segmentation and word segmentation. Translating with syllable segmentation achieved higher quality machine translation for both Myanmar and Shan languages. BLEU and RIBES scoring techniques are used to measure the performance of the machine translations. The operation sequence model gave the highest scores (41.85 BLEU and 0.88031 RIBES) for Shan to Myanmar syllable translation. For Myanmar to Shan syllable translation, hierarchical phrase-based machine translation gave the highest BLEU score of 34.72 and the operation sequence model gave the highest RIBES score of 0.87012. Our experimental results with syllable segmentation produced promising results even with low data resources and we expect this can be developed into a useful translation system as more data comes available in the future.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信