{"title":"An Algorithm for Myanmar Syllable Segmentation based on the Official Standard Myanmar Unicode Text","authors":"Sun Thurain Moe, Than Than Nwe","doi":"10.1109/ICCA51723.2023.10181391","DOIUrl":null,"url":null,"abstract":"The Myanmar language and its characters are complex and do not directly resemble any other language, so current linguistic and NLP theories do not seem to work well for Myanmar script. Syllable segmentation, which is the basic and important level for Myanmar NLP, Myanmar Syllable Segmentation (MSS) Algorithm will be presented in this paper based on the Pyidaungsu font currently designated as the official Myanmar Unicode standard. After several trials and successful removal of confusion, we obtained a set of syllable segmentation rules using 16 vowels and one symbol used in consonant conjuncts. It was found that the rule set using in our proposed algorithm, which is clear and simple enough for the public to understand, can correctly segmented all possible syllable combinations included in Myanmar script.","PeriodicalId":110447,"journal":{"name":"2023 IEEE Conference on Computer Applications (ICCA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Conference on Computer Applications (ICCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCA51723.2023.10181391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Myanmar language and its characters are complex and do not directly resemble any other language, so current linguistic and NLP theories do not seem to work well for Myanmar script. Syllable segmentation, which is the basic and important level for Myanmar NLP, Myanmar Syllable Segmentation (MSS) Algorithm will be presented in this paper based on the Pyidaungsu font currently designated as the official Myanmar Unicode standard. After several trials and successful removal of confusion, we obtained a set of syllable segmentation rules using 16 vowels and one symbol used in consonant conjuncts. It was found that the rule set using in our proposed algorithm, which is clear and simple enough for the public to understand, can correctly segmented all possible syllable combinations included in Myanmar script.