A. M. Mon, M. Thein, S. S. Htay, Soe Lai Phyue, Thinn Thinn Win
{"title":"基于统计方法的缅甸语词边界及分词分析","authors":"A. M. Mon, M. Thein, S. S. Htay, Soe Lai Phyue, Thinn Thinn Win","doi":"10.1109/ICACTE.2010.5579805","DOIUrl":null,"url":null,"abstract":"This paper proposed a unified approach for Myanmar Word analysis using Finite State Automata (FSA), Rule Based Heuristic Approach and Statistical Approach. Myanmar has no inter-word space and it make the tokenizing task difficulties. Therefore, to recognize the word, we implement with FSA. Segmentation is a major problem because of no delimiter. If there were errors in segmentation, this will cause subsequence failure in further NLP processes. Segmentation is also an essential preprocessing task for Natural Language Processing, such as Machine Translation, Information Retrieval etc. In this system, the Rule Based Heuristic Approach and Statistical Approach are used with corpus based dictionary. Evaluation results showed that the method is very effective for the Myanmar language.","PeriodicalId":255806,"journal":{"name":"2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Analysis of Myanmar Word boundary and segmentation by using Statistical Approach\",\"authors\":\"A. M. Mon, M. Thein, S. S. Htay, Soe Lai Phyue, Thinn Thinn Win\",\"doi\":\"10.1109/ICACTE.2010.5579805\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposed a unified approach for Myanmar Word analysis using Finite State Automata (FSA), Rule Based Heuristic Approach and Statistical Approach. Myanmar has no inter-word space and it make the tokenizing task difficulties. Therefore, to recognize the word, we implement with FSA. Segmentation is a major problem because of no delimiter. If there were errors in segmentation, this will cause subsequence failure in further NLP processes. Segmentation is also an essential preprocessing task for Natural Language Processing, such as Machine Translation, Information Retrieval etc. In this system, the Rule Based Heuristic Approach and Statistical Approach are used with corpus based dictionary. Evaluation results showed that the method is very effective for the Myanmar language.\",\"PeriodicalId\":255806,\"journal\":{\"name\":\"2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACTE.2010.5579805\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACTE.2010.5579805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of Myanmar Word boundary and segmentation by using Statistical Approach
This paper proposed a unified approach for Myanmar Word analysis using Finite State Automata (FSA), Rule Based Heuristic Approach and Statistical Approach. Myanmar has no inter-word space and it make the tokenizing task difficulties. Therefore, to recognize the word, we implement with FSA. Segmentation is a major problem because of no delimiter. If there were errors in segmentation, this will cause subsequence failure in further NLP processes. Segmentation is also an essential preprocessing task for Natural Language Processing, such as Machine Translation, Information Retrieval etc. In this system, the Rule Based Heuristic Approach and Statistical Approach are used with corpus based dictionary. Evaluation results showed that the method is very effective for the Myanmar language.