Ye Kyaw Thu, Hlaing Myat New, Hninn Aye Thant, Hay Man Htun, H. Mon, May Myat Myat Khaing, Hsu Pan Oo, Pale Phyu, Nang Aeindray Kyaw, T. Oo, T. Oo, Thet Thet Zin, T. Oo
{"title":"sylbreak4all: Regular Expressions for Syllable Breaking of Nine Major Ethnic Languages of Myanmar","authors":"Ye Kyaw Thu, Hlaing Myat New, Hninn Aye Thant, Hay Man Htun, H. Mon, May Myat Myat Khaing, Hsu Pan Oo, Pale Phyu, Nang Aeindray Kyaw, T. Oo, T. Oo, Thet Thet Zin, T. Oo","doi":"10.1109/iSAI-NLP54397.2021.9678188","DOIUrl":null,"url":null,"abstract":"Unlike many other western languages, the Myanmar language uses a syllabic writing system and no space between words. Syllable segmentation is the necessary preprocess for natural language processing (NLP) tasks such as grapheme-to-phoneme (g2p) conversion, machine translation, romanization, and so on. In this study, sylbreak4all, a syllable segmentation tool, was developed for nine major ethnic languages of Myanmar, and they are Burmese, Shan, Pa’o, Pwo Kayin, S’gaw Kayin, Rakhine, Myeik, Dawei, and Mon by using regular expression (RE) patterns.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678188","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Unlike many other western languages, the Myanmar language uses a syllabic writing system and no space between words. Syllable segmentation is the necessary preprocess for natural language processing (NLP) tasks such as grapheme-to-phoneme (g2p) conversion, machine translation, romanization, and so on. In this study, sylbreak4all, a syllable segmentation tool, was developed for nine major ethnic languages of Myanmar, and they are Burmese, Shan, Pa’o, Pwo Kayin, S’gaw Kayin, Rakhine, Myeik, Dawei, and Mon by using regular expression (RE) patterns.