E. Artemova, Aleksandr Maksimenko, Dmitriy Ohrimenko
{"title":"机器学习方法在俄语和英语网络媒体中腐败相关内容分类中的应用","authors":"E. Artemova, Aleksandr Maksimenko, Dmitriy Ohrimenko","doi":"10.19181/4m.2021.52.5","DOIUrl":null,"url":null,"abstract":"The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.","PeriodicalId":271863,"journal":{"name":"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Application of machine learning methods in the classification of corruption related content in Russian-speaking and English-speaking Internet media\",\"authors\":\"E. Artemova, Aleksandr Maksimenko, Dmitriy Ohrimenko\",\"doi\":\"10.19181/4m.2021.52.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.\",\"PeriodicalId\":271863,\"journal\":{\"name\":\"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.19181/4m.2021.52.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociology: methodology, methods, mathematical modeling (Sociology: 4M)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.19181/4m.2021.52.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Application of machine learning methods in the classification of corruption related content in Russian-speaking and English-speaking Internet media
The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.