A. Gatiatullin, Lenara Kubedinova, N. Prokopyev, Abduramanov Ibraim
{"title":"Toolset of “Turkic Morpheme” Portal for Creation of Electronic Corpora of Turkic Languages in a Unified Conceptual Space","authors":"A. Gatiatullin, Lenara Kubedinova, N. Prokopyev, Abduramanov Ibraim","doi":"10.1109/UBMK55850.2022.9919449","DOIUrl":null,"url":null,"abstract":"Sphere of electronic corpora creation as a way of preservation and development of natural languages as well as a resource base for developers of NLP technologies and language researchers experience a rapid increase in number and volume of electronic corpora for many languages, including Turkic languages. However, a lot of Turkic languages has no corpus due to their developers having problems with implementation and supporting the operation of such large, technically demanding resources. This paper presents the toolset for creation of Turkic corpora within the “Turkic Morpheme” web-portal, the multilingual resource with language-independent grammatical, syntactic and semantic level models and language-dependent data for Turkic language family. Use of this toolset will help to solve corpus annotation and NLP processing unification problem and enrich the portal resources by creating a unified conceptual space of Turkic electronic corpora.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sphere of electronic corpora creation as a way of preservation and development of natural languages as well as a resource base for developers of NLP technologies and language researchers experience a rapid increase in number and volume of electronic corpora for many languages, including Turkic languages. However, a lot of Turkic languages has no corpus due to their developers having problems with implementation and supporting the operation of such large, technically demanding resources. This paper presents the toolset for creation of Turkic corpora within the “Turkic Morpheme” web-portal, the multilingual resource with language-independent grammatical, syntactic and semantic level models and language-dependent data for Turkic language family. Use of this toolset will help to solve corpus annotation and NLP processing unification problem and enrich the portal resources by creating a unified conceptual space of Turkic electronic corpora.