Toolset of “Turkic Morpheme” Portal for Creation of Electronic Corpora of Turkic Languages in a Unified Conceptual Space

2022 7th International Conference on Computer Science and Engineering (UBMK) Pub Date : 2022-09-14 DOI:10.1109/UBMK55850.2022.9919449

A. Gatiatullin, Lenara Kubedinova, N. Prokopyev, Abduramanov Ibraim

引用次数: 0

Abstract

Sphere of electronic corpora creation as a way of preservation and development of natural languages as well as a resource base for developers of NLP technologies and language researchers experience a rapid increase in number and volume of electronic corpora for many languages, including Turkic languages. However, a lot of Turkic languages has no corpus due to their developers having problems with implementation and supporting the operation of such large, technically demanding resources. This paper presents the toolset for creation of Turkic corpora within the “Turkic Morpheme” web-portal, the multilingual resource with language-independent grammatical, syntactic and semantic level models and language-dependent data for Turkic language family. Use of this toolset will help to solve corpus annotation and NLP processing unification problem and enrich the portal resources by creating a unified conceptual space of Turkic electronic corpora.

查看原文本刊更多论文

统一概念空间中突厥语电子语料库创建的“突厥语素”门户工具集

电子语料库创建领域作为自然语言保存和发展的一种方式，也是自然语言处理技术开发人员和语言研究人员的资源基础，包括突厥语在内的许多语言的电子语料库的数量和体积都在迅速增加。然而，许多突厥语言没有语料库，因为它们的开发人员在实现和支持如此庞大的技术要求资源的操作方面存在问题。本文介绍了在“突厥语素”门户网站中创建突厥语料库的工具集，该门户网站是突厥语族的多语言资源，具有独立于语言的语法、句法和语义层次模型以及依赖于语言的数据。使用该工具集将有助于解决语料库标注和自然语言处理的统一问题，并通过创建一个统一的突厥语电子语料库概念空间来丰富门户资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 7th International Conference on Computer Science and Engineering (UBMK)

自引率

0.00%

发文量