MorphUz:乌兹别克语的形态分析器

2022 7th International Conference on Computer Science and Engineering (UBMK) Pub Date : 2022-09-14 DOI:10.1109/UBMK55850.2022.9919579

N. Abdurakhmonova, Ismailov Alisher, Rano Sayfulleyeva

{"title":"MorphUz:乌兹别克语的形态分析器","authors":"N. Abdurakhmonova, Ismailov Alisher, Rano Sayfulleyeva","doi":"10.1109/UBMK55850.2022.9919579","DOIUrl":null,"url":null,"abstract":"The Uzbek language is an agglutinative language in that words are derived from stems (root) by concatenating affixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size. Therefore, words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information that is necessary for downstream applications. This paper discusses a morphological analyzer tool for natural language processing and machine learning purpose. The tool named MorphUz, which can split a text of words into a sequence of morphemes. Morphological analyzer is one of the main part of the natural language processing. MorphUz analyzer is an open-source morphological analyzer for the Uzbek language. The MorphUz analyzer is available as a website for exploration. MorphUz analyzer implements the morphology of the Uzbek language following a two-level approach using stemming and suffix analyzer. The implementation of MorphUz analyzer done by using PHP and JavaScript scripts and MySQL database.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"MorphUz: Morphological Analyzer for the Uzbek Language\",\"authors\":\"N. Abdurakhmonova, Ismailov Alisher, Rano Sayfulleyeva\",\"doi\":\"10.1109/UBMK55850.2022.9919579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Uzbek language is an agglutinative language in that words are derived from stems (root) by concatenating affixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size. Therefore, words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information that is necessary for downstream applications. This paper discusses a morphological analyzer tool for natural language processing and machine learning purpose. The tool named MorphUz, which can split a text of words into a sequence of morphemes. Morphological analyzer is one of the main part of the natural language processing. MorphUz analyzer is an open-source morphological analyzer for the Uzbek language. The MorphUz analyzer is available as a website for exploration. MorphUz analyzer implements the morphology of the Uzbek language following a two-level approach using stemming and suffix analyzer. The implementation of MorphUz analyzer done by using PHP and JavaScript scripts and MySQL database.\",\"PeriodicalId\":417604,\"journal\":{\"name\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK55850.2022.9919579\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

乌兹别克语是一种粘连的语言，因为单词是由词干(词根)通过连接词缀而产生的。这一特性产生了大量的语素组合，极大地增加了词汇量。因此，单词被分成一定的子单词单位，并应用于文本和语音处理应用。适当的子词单位不仅可以提供高覆盖率和更小的词典大小，还可以提供下游应用程序所需的语义和语法信息。本文讨论了一种用于自然语言处理和机器学习的形态分析工具。这款名为MorphUz的工具可以将一段文字分割成一系列的语素。形态分析器是自然语言处理的重要组成部分之一。MorphUz分析器是乌兹别克语的开源形态分析器。MorphUz分析仪可作为一个网站进行探索。MorphUz分析器使用词干和后缀分析器实现了乌兹别克语的两级方法。MorphUz分析仪的实现是通过使用PHP和JavaScript脚本以及MySQL数据库完成的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MorphUz: Morphological Analyzer for the Uzbek Language

The Uzbek language is an agglutinative language in that words are derived from stems (root) by concatenating affixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size. Therefore, words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information that is necessary for downstream applications. This paper discusses a morphological analyzer tool for natural language processing and machine learning purpose. The tool named MorphUz, which can split a text of words into a sequence of morphemes. Morphological analyzer is one of the main part of the natural language processing. MorphUz analyzer is an open-source morphological analyzer for the Uzbek language. The MorphUz analyzer is available as a website for exploration. MorphUz analyzer implements the morphology of the Uzbek language following a two-level approach using stemming and suffix analyzer. The implementation of MorphUz analyzer done by using PHP and JavaScript scripts and MySQL database.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 7th International Conference on Computer Science and Engineering (UBMK)

自引率

0.00%

发文量