ParsBERT Post-Training for Sentiment Analysis of Tweets Concerning Stock Market

2021 26th International Computer Conference, Computer Society of Iran (CSICC) Pub Date : 2021-03-03 DOI:10.1109/CSICC52343.2021.9420569

Mohammadjalal Pouromid, Arman Yekkehkhani, M. A. Oskoei, Amin Aminimehr

{"title":"ParsBERT Post-Training for Sentiment Analysis of Tweets Concerning Stock Market","authors":"Mohammadjalal Pouromid, Arman Yekkehkhani, M. A. Oskoei, Amin Aminimehr","doi":"10.1109/CSICC52343.2021.9420569","DOIUrl":null,"url":null,"abstract":"Social media has become a playground for users to share their ideas freely. Analyzing these data has become of special interest to authorities and consulting firms. They seek to choose right policies based on the insight acquired. Hence, sentiment analysis of data spread in social media has gained significant importance. There are two major approaches for sentiment analysis including lexicon-based and supervised methods. Among supervised methods, deep models have proven to be a better fit for the sentiment analysis task. Since, they are domain free and able to handle large volumes of data effectively. In particular, BERT’s state of the art performance on various natural language processing tasks has encouraged us to use this network architecture for sentiment analysis. In this research, over 12000 Persian tweets including the stock market keyword have been crawled from twitter. They are labeled manually in three different categories of positive, neutral and negative. Then a pre-trained ParsBERT model has been fine-tuned on these data. Our model is evaluated on the test dataset and compared to its counterpart, lexicon-based method using Polyglot as its lexicon. Accuracy of 82 percent has been achieved by our proposed model surpassing its lexicon-based contender.","PeriodicalId":374593,"journal":{"name":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th International Computer Conference, Computer Society of Iran (CSICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSICC52343.2021.9420569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Social media has become a playground for users to share their ideas freely. Analyzing these data has become of special interest to authorities and consulting firms. They seek to choose right policies based on the insight acquired. Hence, sentiment analysis of data spread in social media has gained significant importance. There are two major approaches for sentiment analysis including lexicon-based and supervised methods. Among supervised methods, deep models have proven to be a better fit for the sentiment analysis task. Since, they are domain free and able to handle large volumes of data effectively. In particular, BERT’s state of the art performance on various natural language processing tasks has encouraged us to use this network architecture for sentiment analysis. In this research, over 12000 Persian tweets including the stock market keyword have been crawled from twitter. They are labeled manually in three different categories of positive, neutral and negative. Then a pre-trained ParsBERT model has been fine-tuned on these data. Our model is evaluated on the test dataset and compared to its counterpart, lexicon-based method using Polyglot as its lexicon. Accuracy of 82 percent has been achieved by our proposed model surpassing its lexicon-based contender.

查看原文本刊更多论文

股票市场推文情绪分析的ParsBERT后训练

社交媒体已经成为用户自由分享想法的游乐场。当局和咨询公司对分析这些数据特别感兴趣。他们寻求根据获得的洞察力选择正确的政策。因此，对社交媒体上传播的数据进行情感分析变得非常重要。情感分析有两种主要的方法，包括基于词典的方法和监督方法。在监督方法中，深度模型被证明更适合情感分析任务。因此，它们是无域的，能够有效地处理大量数据。特别是，BERT在各种自然语言处理任务上的先进表现鼓励我们使用这种网络架构进行情感分析。在这项研究中，从推特上抓取了超过12000条波斯语推文，包括股票市场关键字。它们被人工标记为积极、中性和消极三种不同的类别。然后一个预先训练好的ParsBERT模型在这些数据上进行微调。我们的模型在测试数据集上进行了评估，并与使用Polyglot作为词典的基于词典的方法进行了比较。我们提出的模型超过了基于词典的竞争者，准确率达到了82%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 26th International Computer Conference, Computer Society of Iran (CSICC)

自引率

0.00%

发文量