社交网络中哈萨克语词法错误的识别任务

D. Rakhimova, Yntymak Abdrazakh
{"title":"社交网络中哈萨克语词法错误的识别任务","authors":"D. Rakhimova, Yntymak Abdrazakh","doi":"10.1109/UBMK55850.2022.9919516","DOIUrl":null,"url":null,"abstract":"On the Internet and on Instagram, Vkontakte, Twitter and other social networks, applications are very attractive in terms of receiving and analyzing information in messages, because the information in these systems is real. However, text posts, user comments in social. networks often differs from the generally accepted norms of the language. There are mistakes in it, deliberate distortions of words. Unfortunately, such inaccurate data cannot be taken into account in content analysis, statistical or sentiment analysis of data. Words with such errors, i.e. incorrect words can be processed and analyzed, identifying and correcting them into the correct form. This paper presents research and implementation of a model for determining incorrect words of the Kazakh language in semi-structured data, using the example of comments and posts on social networks. To solve and analyze the task, the following was done: a comparative analysis of text correction systems was carried out; explored various spell-checking technologies; the most frequently used errors are identified and the classification of errors in words is presented. A dictionary of the basics of the Kazakh language from the electronic corpus has been developed. The authors have developed an approach to identify incorrect words and auto-replace with a suitable candidate object (word) for the Kazakh language. To solve the problem, a approach is presented to detect incorrect words in the Kazakh language. An approach has been developed for identifying incorrect words from semi-structured data, which is based on the stemming algorithm with lexicon stems according to the CSE (Complete Set of Endings) Experimental calculations and evaluation of the results have been carried out.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Task of Identifying Morphological Errors of Words in the Kazakh Language in Social Networks\",\"authors\":\"D. Rakhimova, Yntymak Abdrazakh\",\"doi\":\"10.1109/UBMK55850.2022.9919516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"On the Internet and on Instagram, Vkontakte, Twitter and other social networks, applications are very attractive in terms of receiving and analyzing information in messages, because the information in these systems is real. However, text posts, user comments in social. networks often differs from the generally accepted norms of the language. There are mistakes in it, deliberate distortions of words. Unfortunately, such inaccurate data cannot be taken into account in content analysis, statistical or sentiment analysis of data. Words with such errors, i.e. incorrect words can be processed and analyzed, identifying and correcting them into the correct form. This paper presents research and implementation of a model for determining incorrect words of the Kazakh language in semi-structured data, using the example of comments and posts on social networks. To solve and analyze the task, the following was done: a comparative analysis of text correction systems was carried out; explored various spell-checking technologies; the most frequently used errors are identified and the classification of errors in words is presented. A dictionary of the basics of the Kazakh language from the electronic corpus has been developed. The authors have developed an approach to identify incorrect words and auto-replace with a suitable candidate object (word) for the Kazakh language. To solve the problem, a approach is presented to detect incorrect words in the Kazakh language. An approach has been developed for identifying incorrect words from semi-structured data, which is based on the stemming algorithm with lexicon stems according to the CSE (Complete Set of Endings) Experimental calculations and evaluation of the results have been carried out.\",\"PeriodicalId\":417604,\"journal\":{\"name\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK55850.2022.9919516\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在互联网上,在Instagram, Vkontakte, Twitter等社交网络上,应用程序在接收和分析消息中的信息方面非常有吸引力,因为这些系统中的信息是真实的。然而,文字帖子、用户评论在社交中。网络通常不同于普遍接受的语言规范。里面有错误,故意曲解的词。不幸的是,在数据的内容分析、统计或情感分析中不能考虑到这些不准确的数据。有这种错误的单词,即不正确的单词,可以进行处理和分析,识别并纠正为正确的形式。本文以社交网络上的评论和帖子为例,介绍了在半结构化数据中确定哈萨克语错误单词的模型的研究和实现。为了解决和分析这个问题,本文做了以下工作:对文本纠错系统进行了对比分析;探索各种拼写检查技术;找出了最常见的错误,并对单词中的错误进行了分类。根据电子语料库编写了一本哈萨克语基础词典。作者开发了一种方法来识别不正确的单词,并为哈萨克语自动替换为合适的候选对象(单词)。为了解决这一问题,提出了一种检测哈萨克语错别字的方法。基于基于CSE (Complete Set of Endings)词典词干的词干提取算法,提出了一种从半结构化数据中识别错误词的方法,并进行了实验计算和结果评价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Task of Identifying Morphological Errors of Words in the Kazakh Language in Social Networks
On the Internet and on Instagram, Vkontakte, Twitter and other social networks, applications are very attractive in terms of receiving and analyzing information in messages, because the information in these systems is real. However, text posts, user comments in social. networks often differs from the generally accepted norms of the language. There are mistakes in it, deliberate distortions of words. Unfortunately, such inaccurate data cannot be taken into account in content analysis, statistical or sentiment analysis of data. Words with such errors, i.e. incorrect words can be processed and analyzed, identifying and correcting them into the correct form. This paper presents research and implementation of a model for determining incorrect words of the Kazakh language in semi-structured data, using the example of comments and posts on social networks. To solve and analyze the task, the following was done: a comparative analysis of text correction systems was carried out; explored various spell-checking technologies; the most frequently used errors are identified and the classification of errors in words is presented. A dictionary of the basics of the Kazakh language from the electronic corpus has been developed. The authors have developed an approach to identify incorrect words and auto-replace with a suitable candidate object (word) for the Kazakh language. To solve the problem, a approach is presented to detect incorrect words in the Kazakh language. An approach has been developed for identifying incorrect words from semi-structured data, which is based on the stemming algorithm with lexicon stems according to the CSE (Complete Set of Endings) Experimental calculations and evaluation of the results have been carried out.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信