基于官方标准缅甸语Unicode文本的缅甸语音节分割算法

Sun Thurain Moe, Than Than Nwe
{"title":"基于官方标准缅甸语Unicode文本的缅甸语音节分割算法","authors":"Sun Thurain Moe, Than Than Nwe","doi":"10.1109/ICCA51723.2023.10181391","DOIUrl":null,"url":null,"abstract":"The Myanmar language and its characters are complex and do not directly resemble any other language, so current linguistic and NLP theories do not seem to work well for Myanmar script. Syllable segmentation, which is the basic and important level for Myanmar NLP, Myanmar Syllable Segmentation (MSS) Algorithm will be presented in this paper based on the Pyidaungsu font currently designated as the official Myanmar Unicode standard. After several trials and successful removal of confusion, we obtained a set of syllable segmentation rules using 16 vowels and one symbol used in consonant conjuncts. It was found that the rule set using in our proposed algorithm, which is clear and simple enough for the public to understand, can correctly segmented all possible syllable combinations included in Myanmar script.","PeriodicalId":110447,"journal":{"name":"2023 IEEE Conference on Computer Applications (ICCA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Algorithm for Myanmar Syllable Segmentation based on the Official Standard Myanmar Unicode Text\",\"authors\":\"Sun Thurain Moe, Than Than Nwe\",\"doi\":\"10.1109/ICCA51723.2023.10181391\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Myanmar language and its characters are complex and do not directly resemble any other language, so current linguistic and NLP theories do not seem to work well for Myanmar script. Syllable segmentation, which is the basic and important level for Myanmar NLP, Myanmar Syllable Segmentation (MSS) Algorithm will be presented in this paper based on the Pyidaungsu font currently designated as the official Myanmar Unicode standard. After several trials and successful removal of confusion, we obtained a set of syllable segmentation rules using 16 vowels and one symbol used in consonant conjuncts. It was found that the rule set using in our proposed algorithm, which is clear and simple enough for the public to understand, can correctly segmented all possible syllable combinations included in Myanmar script.\",\"PeriodicalId\":110447,\"journal\":{\"name\":\"2023 IEEE Conference on Computer Applications (ICCA)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Conference on Computer Applications (ICCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCA51723.2023.10181391\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Conference on Computer Applications (ICCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCA51723.2023.10181391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

缅甸语和它的文字是复杂的,并不直接类似于任何其他语言,所以目前的语言学和NLP理论似乎并不适用于缅甸文字。音节分词是缅甸语自然语言处理的基础和重要层面,本文将基于目前被指定为缅甸官方Unicode标准的Pyidaungsu字体,提出缅甸语音节分词(MSS)算法。经过多次尝试并成功地消除了混淆,我们获得了一套使用16个元音和一个辅音连词符号的音节分割规则。结果表明,本文算法所使用的规则集清晰易懂,能够正确分割出缅甸文中所有可能的音节组合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Algorithm for Myanmar Syllable Segmentation based on the Official Standard Myanmar Unicode Text
The Myanmar language and its characters are complex and do not directly resemble any other language, so current linguistic and NLP theories do not seem to work well for Myanmar script. Syllable segmentation, which is the basic and important level for Myanmar NLP, Myanmar Syllable Segmentation (MSS) Algorithm will be presented in this paper based on the Pyidaungsu font currently designated as the official Myanmar Unicode standard. After several trials and successful removal of confusion, we obtained a set of syllable segmentation rules using 16 vowels and one symbol used in consonant conjuncts. It was found that the rule set using in our proposed algorithm, which is clear and simple enough for the public to understand, can correctly segmented all possible syllable combinations included in Myanmar script.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信