Nang Aeindray Kyaw, Ye Kyaw Thu, Hlaing Myat Nwe, Phyu Phyu Tar, N. Min, T. Supnithi
{"title":"A Study of Three Statistical Machine Translation Methods for Myanmar (Burmese) and Shan (Tai Long) Language Pair","authors":"Nang Aeindray Kyaw, Ye Kyaw Thu, Hlaing Myat Nwe, Phyu Phyu Tar, N. Min, T. Supnithi","doi":"10.1109/iSAI-NLP51646.2020.9376832","DOIUrl":null,"url":null,"abstract":"Shan is said to be the second-largest ethnic group of Myanmar. The main motivation is to break down the communication barrier between Shan people and Myanmar people. This paper contributes to the first evaluation of the quality of machine translation between Myanmar (Burmese) and Shan (Tai Long). We also built a Myanmar-Shan parallel corpus (around 11K sentences) based on the Myanmar language of the ASEAN MT corpus. In this research, three different statistical machine translation approaches were used to carry out the experiment: phrase-based, hierarchical phrase-based, and the operation sequence model. Furthermore, two different segmentation schemes were studied, these were syllable segmentation and word segmentation. Translating with syllable segmentation achieved higher quality machine translation for both Myanmar and Shan languages. BLEU and RIBES scoring techniques are used to measure the performance of the machine translations. The operation sequence model gave the highest scores (41.85 BLEU and 0.88031 RIBES) for Shan to Myanmar syllable translation. For Myanmar to Shan syllable translation, hierarchical phrase-based machine translation gave the highest BLEU score of 34.72 and the operation sequence model gave the highest RIBES score of 0.87012. Our experimental results with syllable segmentation produced promising results even with low data resources and we expect this can be developed into a useful translation system as more data comes available in the future.","PeriodicalId":311014,"journal":{"name":"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP51646.2020.9376832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Shan is said to be the second-largest ethnic group of Myanmar. The main motivation is to break down the communication barrier between Shan people and Myanmar people. This paper contributes to the first evaluation of the quality of machine translation between Myanmar (Burmese) and Shan (Tai Long). We also built a Myanmar-Shan parallel corpus (around 11K sentences) based on the Myanmar language of the ASEAN MT corpus. In this research, three different statistical machine translation approaches were used to carry out the experiment: phrase-based, hierarchical phrase-based, and the operation sequence model. Furthermore, two different segmentation schemes were studied, these were syllable segmentation and word segmentation. Translating with syllable segmentation achieved higher quality machine translation for both Myanmar and Shan languages. BLEU and RIBES scoring techniques are used to measure the performance of the machine translations. The operation sequence model gave the highest scores (41.85 BLEU and 0.88031 RIBES) for Shan to Myanmar syllable translation. For Myanmar to Shan syllable translation, hierarchical phrase-based machine translation gave the highest BLEU score of 34.72 and the operation sequence model gave the highest RIBES score of 0.87012. Our experimental results with syllable segmentation produced promising results even with low data resources and we expect this can be developed into a useful translation system as more data comes available in the future.