统一概念空间中突厥语电子语料库创建的“突厥语素”门户工具集

A. Gatiatullin, Lenara Kubedinova, N. Prokopyev, Abduramanov Ibraim
{"title":"统一概念空间中突厥语电子语料库创建的“突厥语素”门户工具集","authors":"A. Gatiatullin, Lenara Kubedinova, N. Prokopyev, Abduramanov Ibraim","doi":"10.1109/UBMK55850.2022.9919449","DOIUrl":null,"url":null,"abstract":"Sphere of electronic corpora creation as a way of preservation and development of natural languages as well as a resource base for developers of NLP technologies and language researchers experience a rapid increase in number and volume of electronic corpora for many languages, including Turkic languages. However, a lot of Turkic languages has no corpus due to their developers having problems with implementation and supporting the operation of such large, technically demanding resources. This paper presents the toolset for creation of Turkic corpora within the “Turkic Morpheme” web-portal, the multilingual resource with language-independent grammatical, syntactic and semantic level models and language-dependent data for Turkic language family. Use of this toolset will help to solve corpus annotation and NLP processing unification problem and enrich the portal resources by creating a unified conceptual space of Turkic electronic corpora.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toolset of “Turkic Morpheme” Portal for Creation of Electronic Corpora of Turkic Languages in a Unified Conceptual Space\",\"authors\":\"A. Gatiatullin, Lenara Kubedinova, N. Prokopyev, Abduramanov Ibraim\",\"doi\":\"10.1109/UBMK55850.2022.9919449\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sphere of electronic corpora creation as a way of preservation and development of natural languages as well as a resource base for developers of NLP technologies and language researchers experience a rapid increase in number and volume of electronic corpora for many languages, including Turkic languages. However, a lot of Turkic languages has no corpus due to their developers having problems with implementation and supporting the operation of such large, technically demanding resources. This paper presents the toolset for creation of Turkic corpora within the “Turkic Morpheme” web-portal, the multilingual resource with language-independent grammatical, syntactic and semantic level models and language-dependent data for Turkic language family. Use of this toolset will help to solve corpus annotation and NLP processing unification problem and enrich the portal resources by creating a unified conceptual space of Turkic electronic corpora.\",\"PeriodicalId\":417604,\"journal\":{\"name\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK55850.2022.9919449\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

电子语料库创建领域作为自然语言保存和发展的一种方式,也是自然语言处理技术开发人员和语言研究人员的资源基础,包括突厥语在内的许多语言的电子语料库的数量和体积都在迅速增加。然而,许多突厥语言没有语料库,因为它们的开发人员在实现和支持如此庞大的技术要求资源的操作方面存在问题。本文介绍了在“突厥语素”门户网站中创建突厥语料库的工具集,该门户网站是突厥语族的多语言资源,具有独立于语言的语法、句法和语义层次模型以及依赖于语言的数据。使用该工具集将有助于解决语料库标注和自然语言处理的统一问题,并通过创建一个统一的突厥语电子语料库概念空间来丰富门户资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Toolset of “Turkic Morpheme” Portal for Creation of Electronic Corpora of Turkic Languages in a Unified Conceptual Space
Sphere of electronic corpora creation as a way of preservation and development of natural languages as well as a resource base for developers of NLP technologies and language researchers experience a rapid increase in number and volume of electronic corpora for many languages, including Turkic languages. However, a lot of Turkic languages has no corpus due to their developers having problems with implementation and supporting the operation of such large, technically demanding resources. This paper presents the toolset for creation of Turkic corpora within the “Turkic Morpheme” web-portal, the multilingual resource with language-independent grammatical, syntactic and semantic level models and language-dependent data for Turkic language family. Use of this toolset will help to solve corpus annotation and NLP processing unification problem and enrich the portal resources by creating a unified conceptual space of Turkic electronic corpora.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信