MorphUz:乌兹别克语的形态分析器

N. Abdurakhmonova, Ismailov Alisher, Rano Sayfulleyeva
{"title":"MorphUz:乌兹别克语的形态分析器","authors":"N. Abdurakhmonova, Ismailov Alisher, Rano Sayfulleyeva","doi":"10.1109/UBMK55850.2022.9919579","DOIUrl":null,"url":null,"abstract":"The Uzbek language is an agglutinative language in that words are derived from stems (root) by concatenating affixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size. Therefore, words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information that is necessary for downstream applications. This paper discusses a morphological analyzer tool for natural language processing and machine learning purpose. The tool named MorphUz, which can split a text of words into a sequence of morphemes. Morphological analyzer is one of the main part of the natural language processing. MorphUz analyzer is an open-source morphological analyzer for the Uzbek language. The MorphUz analyzer is available as a website for exploration. MorphUz analyzer implements the morphology of the Uzbek language following a two-level approach using stemming and suffix analyzer. The implementation of MorphUz analyzer done by using PHP and JavaScript scripts and MySQL database.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"MorphUz: Morphological Analyzer for the Uzbek Language\",\"authors\":\"N. Abdurakhmonova, Ismailov Alisher, Rano Sayfulleyeva\",\"doi\":\"10.1109/UBMK55850.2022.9919579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Uzbek language is an agglutinative language in that words are derived from stems (root) by concatenating affixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size. Therefore, words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information that is necessary for downstream applications. This paper discusses a morphological analyzer tool for natural language processing and machine learning purpose. The tool named MorphUz, which can split a text of words into a sequence of morphemes. Morphological analyzer is one of the main part of the natural language processing. MorphUz analyzer is an open-source morphological analyzer for the Uzbek language. The MorphUz analyzer is available as a website for exploration. MorphUz analyzer implements the morphology of the Uzbek language following a two-level approach using stemming and suffix analyzer. The implementation of MorphUz analyzer done by using PHP and JavaScript scripts and MySQL database.\",\"PeriodicalId\":417604,\"journal\":{\"name\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK55850.2022.9919579\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

乌兹别克语是一种粘连的语言,因为单词是由词干(词根)通过连接词缀而产生的。这一特性产生了大量的语素组合,极大地增加了词汇量。因此,单词被分成一定的子单词单位,并应用于文本和语音处理应用。适当的子词单位不仅可以提供高覆盖率和更小的词典大小,还可以提供下游应用程序所需的语义和语法信息。本文讨论了一种用于自然语言处理和机器学习的形态分析工具。这款名为MorphUz的工具可以将一段文字分割成一系列的语素。形态分析器是自然语言处理的重要组成部分之一。MorphUz分析器是乌兹别克语的开源形态分析器。MorphUz分析仪可作为一个网站进行探索。MorphUz分析器使用词干和后缀分析器实现了乌兹别克语的两级方法。MorphUz分析仪的实现是通过使用PHP和JavaScript脚本以及MySQL数据库完成的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MorphUz: Morphological Analyzer for the Uzbek Language
The Uzbek language is an agglutinative language in that words are derived from stems (root) by concatenating affixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size. Therefore, words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information that is necessary for downstream applications. This paper discusses a morphological analyzer tool for natural language processing and machine learning purpose. The tool named MorphUz, which can split a text of words into a sequence of morphemes. Morphological analyzer is one of the main part of the natural language processing. MorphUz analyzer is an open-source morphological analyzer for the Uzbek language. The MorphUz analyzer is available as a website for exploration. MorphUz analyzer implements the morphology of the Uzbek language following a two-level approach using stemming and suffix analyzer. The implementation of MorphUz analyzer done by using PHP and JavaScript scripts and MySQL database.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信