银行业文本情感分析

IF 0.3 Q4 MATHEMATICS, APPLIED

Journal of Applied Mathematics & Informatics Pub Date : 2022-05-31 DOI:10.37791/2687-0649-2022-17-3-5-15

S.P. Stroev, A. V. Zakharov, Zhanna V. Meksheneva, Valentin V. Shokolov, A. M. Nechaev, N. N. Lyublinskaya

{"title":"银行业文本情感分析","authors":"S.P. Stroev, A. V. Zakharov, Zhanna V. Meksheneva, Valentin V. Shokolov, A. M. Nechaev, N. N. Lyublinskaya","doi":"10.37791/2687-0649-2022-17-3-5-15","DOIUrl":null,"url":null,"abstract":"The paper presents the author's approach to solving the problem of sentiment analysis of online Russian-language messages about the activities of banks. The study data are customer reviews about banks in general and their products, services and quality of service posted on the Banki.ru portal. In this paper, the problem of text sentiment analysis is considered as a binary classification task based on a set of positive and negative reviews. A vector model with a tf-idf weighting scheme was used to represent the collected and preprocessed texts. The following algorithms with the selection of optimal parameters on the grid were used for binary classification task: naive Bayesian classifier, support vector machine, logistic regression, random forest and gradient boosting. Standard statistical metrics, such as accuracy, completeness, and F-measure, were used to evaluate the quality of solving the classification problem. For the indicated metrics, the best results were obtained on the classification model developed with the use of Support Vector Machine. Thematic text modeling was also carried out using the Dirichlet latent placement method to define the most typical topics of customer messages. As a result, it was concluded that the most popular message topics are \"cards\" and \"quality of service\". The obtained results can be used in the activities of banks to automate its reputation monitoring in the media and when routing client requests to solve various problems. When solving problems, the features of the Python programming language were actively used, namely, libraries for web scraping, machine learning, and natural language processing.","PeriodicalId":44195,"journal":{"name":"Journal of Applied Mathematics & Informatics","volume":"2 1","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Text sentiment analysis in banking\",\"authors\":\"S.P. Stroev, A. V. Zakharov, Zhanna V. Meksheneva, Valentin V. Shokolov, A. M. Nechaev, N. N. Lyublinskaya\",\"doi\":\"10.37791/2687-0649-2022-17-3-5-15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents the author's approach to solving the problem of sentiment analysis of online Russian-language messages about the activities of banks. The study data are customer reviews about banks in general and their products, services and quality of service posted on the Banki.ru portal. In this paper, the problem of text sentiment analysis is considered as a binary classification task based on a set of positive and negative reviews. A vector model with a tf-idf weighting scheme was used to represent the collected and preprocessed texts. The following algorithms with the selection of optimal parameters on the grid were used for binary classification task: naive Bayesian classifier, support vector machine, logistic regression, random forest and gradient boosting. Standard statistical metrics, such as accuracy, completeness, and F-measure, were used to evaluate the quality of solving the classification problem. For the indicated metrics, the best results were obtained on the classification model developed with the use of Support Vector Machine. Thematic text modeling was also carried out using the Dirichlet latent placement method to define the most typical topics of customer messages. As a result, it was concluded that the most popular message topics are \\\"cards\\\" and \\\"quality of service\\\". The obtained results can be used in the activities of banks to automate its reputation monitoring in the media and when routing client requests to solve various problems. When solving problems, the features of the Python programming language were actively used, namely, libraries for web scraping, machine learning, and natural language processing.\",\"PeriodicalId\":44195,\"journal\":{\"name\":\"Journal of Applied Mathematics & Informatics\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2022-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Applied Mathematics & Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37791/2687-0649-2022-17-3-5-15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Mathematics & Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37791/2687-0649-2022-17-3-5-15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 1

摘要

本文介绍了作者解决在线俄语银行活动信息情感分析问题的方法。研究数据是发布在Banki.ru门户网站上的客户对银行及其产品、服务和服务质量的总体评价。本文将文本情感分析问题视为基于一组正面和负面评论的二元分类任务。采用tf-idf加权方案的向量模型来表示所收集和预处理的文本。二值分类任务采用朴素贝叶斯分类器、支持向量机、逻辑回归、随机森林和梯度增强算法，并在网格上选择最优参数。标准统计指标，如准确性、完整性和F-measure，被用来评价解决分类问题的质量。对于指示的指标，使用支持向量机开发的分类模型获得了最好的结果。利用Dirichlet潜置法进行主题文本建模，定义客户信息中最典型的主题。结果显示，最受欢迎的留言主题是“卡片”和“服务质量”。所获得的结果可用于银行的活动，以自动监控其在媒体中的声誉，并在路由客户请求以解决各种问题时使用。在解决问题时，积极使用Python编程语言的特性，即用于web抓取、机器学习和自然语言处理的库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text sentiment analysis in banking

The paper presents the author's approach to solving the problem of sentiment analysis of online Russian-language messages about the activities of banks. The study data are customer reviews about banks in general and their products, services and quality of service posted on the Banki.ru portal. In this paper, the problem of text sentiment analysis is considered as a binary classification task based on a set of positive and negative reviews. A vector model with a tf-idf weighting scheme was used to represent the collected and preprocessed texts. The following algorithms with the selection of optimal parameters on the grid were used for binary classification task: naive Bayesian classifier, support vector machine, logistic regression, random forest and gradient boosting. Standard statistical metrics, such as accuracy, completeness, and F-measure, were used to evaluate the quality of solving the classification problem. For the indicated metrics, the best results were obtained on the classification model developed with the use of Support Vector Machine. Thematic text modeling was also carried out using the Dirichlet latent placement method to define the most typical topics of customer messages. As a result, it was concluded that the most popular message topics are "cards" and "quality of service". The obtained results can be used in the activities of banks to automate its reputation monitoring in the media and when routing client requests to solve various problems. When solving problems, the features of the Python programming language were actively used, namely, libraries for web scraping, machine learning, and natural language processing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Applied Mathematics & Informatics MATHEMATICS, APPLIED-

CiteScore

0.70

自引率

0.00%

发文量