A. Gatiatullin, Lenara Kubedinova, N. Prokopyev, Abduramanov Ibraim
{"title":"统一概念空间中突厥语电子语料库创建的“突厥语素”门户工具集","authors":"A. Gatiatullin, Lenara Kubedinova, N. Prokopyev, Abduramanov Ibraim","doi":"10.1109/UBMK55850.2022.9919449","DOIUrl":null,"url":null,"abstract":"Sphere of electronic corpora creation as a way of preservation and development of natural languages as well as a resource base for developers of NLP technologies and language researchers experience a rapid increase in number and volume of electronic corpora for many languages, including Turkic languages. However, a lot of Turkic languages has no corpus due to their developers having problems with implementation and supporting the operation of such large, technically demanding resources. This paper presents the toolset for creation of Turkic corpora within the “Turkic Morpheme” web-portal, the multilingual resource with language-independent grammatical, syntactic and semantic level models and language-dependent data for Turkic language family. Use of this toolset will help to solve corpus annotation and NLP processing unification problem and enrich the portal resources by creating a unified conceptual space of Turkic electronic corpora.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toolset of “Turkic Morpheme” Portal for Creation of Electronic Corpora of Turkic Languages in a Unified Conceptual Space\",\"authors\":\"A. Gatiatullin, Lenara Kubedinova, N. Prokopyev, Abduramanov Ibraim\",\"doi\":\"10.1109/UBMK55850.2022.9919449\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sphere of electronic corpora creation as a way of preservation and development of natural languages as well as a resource base for developers of NLP technologies and language researchers experience a rapid increase in number and volume of electronic corpora for many languages, including Turkic languages. However, a lot of Turkic languages has no corpus due to their developers having problems with implementation and supporting the operation of such large, technically demanding resources. This paper presents the toolset for creation of Turkic corpora within the “Turkic Morpheme” web-portal, the multilingual resource with language-independent grammatical, syntactic and semantic level models and language-dependent data for Turkic language family. Use of this toolset will help to solve corpus annotation and NLP processing unification problem and enrich the portal resources by creating a unified conceptual space of Turkic electronic corpora.\",\"PeriodicalId\":417604,\"journal\":{\"name\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK55850.2022.9919449\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Toolset of “Turkic Morpheme” Portal for Creation of Electronic Corpora of Turkic Languages in a Unified Conceptual Space
Sphere of electronic corpora creation as a way of preservation and development of natural languages as well as a resource base for developers of NLP technologies and language researchers experience a rapid increase in number and volume of electronic corpora for many languages, including Turkic languages. However, a lot of Turkic languages has no corpus due to their developers having problems with implementation and supporting the operation of such large, technically demanding resources. This paper presents the toolset for creation of Turkic corpora within the “Turkic Morpheme” web-portal, the multilingual resource with language-independent grammatical, syntactic and semantic level models and language-dependent data for Turkic language family. Use of this toolset will help to solve corpus annotation and NLP processing unification problem and enrich the portal resources by creating a unified conceptual space of Turkic electronic corpora.