Mya Ei San, Ye Kyaw Thu, T. Supnithi, Sasiporn Usanavasin
{"title":"Improving Neural Machine Translation for Low-resource English-Myanmar-Thai Language Pairs with SwitchOut Data Augmentation Algorithm","authors":"Mya Ei San, Ye Kyaw Thu, T. Supnithi, Sasiporn Usanavasin","doi":"10.1109/iSAI-NLP56921.2022.9960261","DOIUrl":null,"url":null,"abstract":"To improve the data resource of low-resource English- Myanmar- Thai language pairs, we build the first parallel medical corpus, named as En-My- Th medical corpus which is composed of total 14,592 parallel sentences. In our paper, we make experiments on the English-Myanmar language pair of new En-My-Th medical corpus and in addition, English-Thai and Thai-Myanmar language pairs from the existing ASEAN- MT corpus. The experiments of SwitchOut data augmentation algorithm and the baseline attention-based sequence to sequence model are trained on the aforementioned language pairs in both directions. Experimental results show that combination of Switch Out algorithm with the baseline model outperforms the baseline only model in the translation of most language pairs for both corpora. Furthermore, we investigate the performance of the baseline model and baseline+SwitchOut model by adding or removing word dropout at the recurrent layers, at which baseline+SwitchOut model with the dropout increases around (+1.0) BLEU4 and GLEU scores in some of language nairs.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
To improve the data resource of low-resource English- Myanmar- Thai language pairs, we build the first parallel medical corpus, named as En-My- Th medical corpus which is composed of total 14,592 parallel sentences. In our paper, we make experiments on the English-Myanmar language pair of new En-My-Th medical corpus and in addition, English-Thai and Thai-Myanmar language pairs from the existing ASEAN- MT corpus. The experiments of SwitchOut data augmentation algorithm and the baseline attention-based sequence to sequence model are trained on the aforementioned language pairs in both directions. Experimental results show that combination of Switch Out algorithm with the baseline model outperforms the baseline only model in the translation of most language pairs for both corpora. Furthermore, we investigate the performance of the baseline model and baseline+SwitchOut model by adding or removing word dropout at the recurrent layers, at which baseline+SwitchOut model with the dropout increases around (+1.0) BLEU4 and GLEU scores in some of language nairs.