以《世界人权宣言》为语料库的突厥语语篇数量语义分析

2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT) Pub Date : 2022-10-12 DOI:10.1109/AICT55583.2022.10013645

A. Adamov, Gozel Khasanova

{"title":"以《世界人权宣言》为语料库的突厥语语篇数量语义分析","authors":"A. Adamov, Gozel Khasanova","doi":"10.1109/AICT55583.2022.10013645","DOIUrl":null,"url":null,"abstract":"Thanks to Web, ubiquitous digital technologies and the increasing usage of digital environment by humans for work, entertainment, education and other activities, huge amounts of textual data is generated and available online. Text is the most informative and at the same time most sophisticated data type in terms of its comprehension by machines. The Text Analytics is a field that involves number of computer science disciplines to process textual data and transforms it into computer readable format suitable for another field of study Natural Language Processing to extract meaning.This research paper is an attempt to apply broad variety of statistical analysis methods to the corpora of several Turkic languages using Universal Declaration of Human Rights as a Corpus. Quantitative Text Analysis as a research area is focused on understanding the human language through statistics and numbers. As the language is the most effective tool to describe the social world, the Quantitative Text Analysis enables social exploration of the rial world at the scale.","PeriodicalId":441475,"journal":{"name":"2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quantitative and Semantic Analysis of Texts in Turkic Languages using Universal Declaration of Human Rights (UDHR) as a Corpus\",\"authors\":\"A. Adamov, Gozel Khasanova\",\"doi\":\"10.1109/AICT55583.2022.10013645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thanks to Web, ubiquitous digital technologies and the increasing usage of digital environment by humans for work, entertainment, education and other activities, huge amounts of textual data is generated and available online. Text is the most informative and at the same time most sophisticated data type in terms of its comprehension by machines. The Text Analytics is a field that involves number of computer science disciplines to process textual data and transforms it into computer readable format suitable for another field of study Natural Language Processing to extract meaning.This research paper is an attempt to apply broad variety of statistical analysis methods to the corpora of several Turkic languages using Universal Declaration of Human Rights as a Corpus. Quantitative Text Analysis as a research area is focused on understanding the human language through statistics and numbers. As the language is the most effective tool to describe the social world, the Quantitative Text Analysis enables social exploration of the rial world at the scale.\",\"PeriodicalId\":441475,\"journal\":{\"name\":\"2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICT55583.2022.10013645\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICT55583.2022.10013645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于网络、无处不在的数字技术以及人类越来越多地使用数字环境进行工作、娱乐、教育和其他活动，大量的文本数据在网上产生和可用。就机器的理解能力而言，文本是信息量最大，同时也是最复杂的数据类型。文本分析是一个涉及许多计算机科学学科的领域，用于处理文本数据并将其转换为适合于另一个研究领域的计算机可读格式，即自然语言处理以提取含义。本研究以《世界人权宣言》为语料库，尝试运用多种统计分析方法对几种突厥语语料库进行分析。定量文本分析作为一个研究领域的重点是通过统计和数字来理解人类语言。由于语言是描述社会世界最有效的工具，定量文本分析可以在规模上对现实世界进行社会探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Quantitative and Semantic Analysis of Texts in Turkic Languages using Universal Declaration of Human Rights (UDHR) as a Corpus

Thanks to Web, ubiquitous digital technologies and the increasing usage of digital environment by humans for work, entertainment, education and other activities, huge amounts of textual data is generated and available online. Text is the most informative and at the same time most sophisticated data type in terms of its comprehension by machines. The Text Analytics is a field that involves number of computer science disciplines to process textual data and transforms it into computer readable format suitable for another field of study Natural Language Processing to extract meaning.This research paper is an attempt to apply broad variety of statistical analysis methods to the corpora of several Turkic languages using Universal Declaration of Human Rights as a Corpus. Quantitative Text Analysis as a research area is focused on understanding the human language through statistics and numbers. As the language is the most effective tool to describe the social world, the Quantitative Text Analysis enables social exploration of the rial world at the scale.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT)

自引率

0.00%

发文量