MorphUz: Morphological Analyzer for the Uzbek Language

N. Abdurakhmonova, Ismailov Alisher, Rano Sayfulleyeva
{"title":"MorphUz: Morphological Analyzer for the Uzbek Language","authors":"N. Abdurakhmonova, Ismailov Alisher, Rano Sayfulleyeva","doi":"10.1109/UBMK55850.2022.9919579","DOIUrl":null,"url":null,"abstract":"The Uzbek language is an agglutinative language in that words are derived from stems (root) by concatenating affixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size. Therefore, words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information that is necessary for downstream applications. This paper discusses a morphological analyzer tool for natural language processing and machine learning purpose. The tool named MorphUz, which can split a text of words into a sequence of morphemes. Morphological analyzer is one of the main part of the natural language processing. MorphUz analyzer is an open-source morphological analyzer for the Uzbek language. The MorphUz analyzer is available as a website for exploration. MorphUz analyzer implements the morphology of the Uzbek language following a two-level approach using stemming and suffix analyzer. The implementation of MorphUz analyzer done by using PHP and JavaScript scripts and MySQL database.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The Uzbek language is an agglutinative language in that words are derived from stems (root) by concatenating affixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size. Therefore, words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information that is necessary for downstream applications. This paper discusses a morphological analyzer tool for natural language processing and machine learning purpose. The tool named MorphUz, which can split a text of words into a sequence of morphemes. Morphological analyzer is one of the main part of the natural language processing. MorphUz analyzer is an open-source morphological analyzer for the Uzbek language. The MorphUz analyzer is available as a website for exploration. MorphUz analyzer implements the morphology of the Uzbek language following a two-level approach using stemming and suffix analyzer. The implementation of MorphUz analyzer done by using PHP and JavaScript scripts and MySQL database.
MorphUz:乌兹别克语的形态分析器
乌兹别克语是一种粘连的语言,因为单词是由词干(词根)通过连接词缀而产生的。这一特性产生了大量的语素组合,极大地增加了词汇量。因此,单词被分成一定的子单词单位,并应用于文本和语音处理应用。适当的子词单位不仅可以提供高覆盖率和更小的词典大小,还可以提供下游应用程序所需的语义和语法信息。本文讨论了一种用于自然语言处理和机器学习的形态分析工具。这款名为MorphUz的工具可以将一段文字分割成一系列的语素。形态分析器是自然语言处理的重要组成部分之一。MorphUz分析器是乌兹别克语的开源形态分析器。MorphUz分析仪可作为一个网站进行探索。MorphUz分析器使用词干和后缀分析器实现了乌兹别克语的两级方法。MorphUz分析仪的实现是通过使用PHP和JavaScript脚本以及MySQL数据库完成的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信