A comparison of collation algorithm for Myanmar language

Yuzana, Khin Mar Lar Tun
{"title":"A comparison of collation algorithm for Myanmar language","authors":"Yuzana, Khin Mar Lar Tun","doi":"10.1109/ICDIM.2008.4746740","DOIUrl":null,"url":null,"abstract":"Myanmar language has no white spaces and word boundary. There is lack of support in Unicode database application such as collation and searching. Powerful collation strategy has necessitated to the all embracing research in the locality of natural language processing. Consequently, we propose a new collation algorithm MyCollate2 extend from MyCollate1 for Myanmar language. This collation algorithm is based on heuristics chart or table. This method foremost slices the syllables of names and then collates them according to the traditional standard Myanmar language dictionary book order. Propose new heuristics chart can work well not only for syllable segmentation but also for collation of words. This algorithm can collate Myanmar names as well as Myanmar words with complex syllable structure such as Pali, Pali loan styles, subscript styles and kinzi styles. This paper tested with Myanmar name, Pali words from Damma books and dictionary words from dictionary book. The experimental result shows that syllable slicing accuracy get 99.55% compare with others and show slicing performance. Collation accuracy gets 95.88% and is better accuracy than previous collation algorithm MyCollate1.","PeriodicalId":415013,"journal":{"name":"2008 Third International Conference on Digital Information Management","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Third International Conference on Digital Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2008.4746740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Myanmar language has no white spaces and word boundary. There is lack of support in Unicode database application such as collation and searching. Powerful collation strategy has necessitated to the all embracing research in the locality of natural language processing. Consequently, we propose a new collation algorithm MyCollate2 extend from MyCollate1 for Myanmar language. This collation algorithm is based on heuristics chart or table. This method foremost slices the syllables of names and then collates them according to the traditional standard Myanmar language dictionary book order. Propose new heuristics chart can work well not only for syllable segmentation but also for collation of words. This algorithm can collate Myanmar names as well as Myanmar words with complex syllable structure such as Pali, Pali loan styles, subscript styles and kinzi styles. This paper tested with Myanmar name, Pali words from Damma books and dictionary words from dictionary book. The experimental result shows that syllable slicing accuracy get 99.55% compare with others and show slicing performance. Collation accuracy gets 95.88% and is better accuracy than previous collation algorithm MyCollate1.
缅甸语整理算法的比较
缅甸语没有空格和词界。Unicode数据库应用程序缺乏对排序和搜索等功能的支持。强大的整理策略对自然语言处理局部性的全面研究是必要的。因此,我们提出了一种新的缅甸语分类算法MyCollate2,扩展自MyCollate1。这种排序算法基于启发式图表或表格。这种方法首先将人名的音节切片,然后按照传统的标准缅甸语词典的书序进行整理。提出的启发式图不仅可以很好地进行音节切分,而且可以很好地进行单词的整理。该算法既可以对缅甸人名进行整理,也可以对巴利语、巴利借调语、下标语、kinzi语等音节结构复杂的缅甸词进行整理。本文用缅甸人名、巴利语达玛书中的词和词典中的词进行了测试。实验结果表明,该方法的音节切片准确率达到99.55%,具有良好的切片性能。排序精度达到95.88%,优于之前的排序算法MyCollate1。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信