{"title":"A Finite State Transducer Based Morphological Analyzer for The Kazakh Language","authors":"Gulmira Tolegen, Alymzhan Toleu, R. Mussabayev","doi":"10.1109/UBMK55850.2022.9919445","DOIUrl":null,"url":null,"abstract":"This paper presents a finite state transducer based morphological analyzer for Kazakh language which is able to decompose complex Kazakh words into consecutive morphemes including lemma, part-of-speech, and morphological tags. Due to the agglutinative nature of the language, the analyzer can produce more than one analysis for each word depending on word's complexity. We conducted several experiments to evaluate the performance of the analyzer. It achieved 92% coverage on large Wikipedia and 96% coverage on the news data, and the accuracy of analyzer was 98.40% on the test data.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper presents a finite state transducer based morphological analyzer for Kazakh language which is able to decompose complex Kazakh words into consecutive morphemes including lemma, part-of-speech, and morphological tags. Due to the agglutinative nature of the language, the analyzer can produce more than one analysis for each word depending on word's complexity. We conducted several experiments to evaluate the performance of the analyzer. It achieved 92% coverage on large Wikipedia and 96% coverage on the news data, and the accuracy of analyzer was 98.40% on the test data.