机器学习方法在俄语和英语网络媒体中腐败相关内容分类中的应用

Sociology: methodology, methods, mathematical modeling (Sociology: 4M) Pub Date : 2022-03-19 DOI:10.19181/4m.2021.52.5

E. Artemova, Aleksandr Maksimenko, Dmitriy Ohrimenko

{"title":"机器学习方法在俄语和英语网络媒体中腐败相关内容分类中的应用","authors":"E. Artemova, Aleksandr Maksimenko, Dmitriy Ohrimenko","doi":"10.19181/4m.2021.52.5","DOIUrl":null,"url":null,"abstract":"The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.","PeriodicalId":271863,"journal":{"name":"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Application of machine learning methods in the classification of corruption related content in Russian-speaking and English-speaking Internet media\",\"authors\":\"E. Artemova, Aleksandr Maksimenko, Dmitriy Ohrimenko\",\"doi\":\"10.19181/4m.2021.52.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.\",\"PeriodicalId\":271863,\"journal\":{\"name\":\"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.19181/4m.2021.52.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.19181/4m.2021.52.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文试图使用机器学习方法对俄语和英语互联网媒体的腐败相关媒体内容进行分类。文章中提出的方法方法是非常相关和有希望的，因为根据我们早先的数据，外国出版物中基于使用先进信息技术的腐败监测机制的潜在效力相当有限，而且并不总是得到充分解释。本研究阐述了识别参数的确定原则和依据，并详细描述了新闻采集阵列的布局方案。在文本自动处理过程中，分两个阶段(文本矢量化和学习模型的使用)，可以解决主要的4个任务:突出显示新闻文章中的重要引文，以识别有关腐败主题的文本，预测新闻信息的类型，预测俄罗斯联邦刑法的相关条款，用于确定所描述的腐败犯罪的责任，以及预测腐败犯罪中的关系类型。结果表明，现代文本自动处理方法成功地处理了俄语和英语中腐败相关内容的识别和分类任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Application of machine learning methods in the classification of corruption related content in Russian-speaking and English-speaking Internet media

The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sociology: methodology, methods, mathematical modeling (Sociology: 4M)

自引率

0.00%

发文量