SA-MAIS:股票市场的混合自动情绪分析器

IF 1.7 4区管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Science Pub Date : 2023-05-06 DOI:10.1177/01655515231171361

Bruno Taborda, Ana Maria de Almeida, José Carlos Dias, Fernando Batista, R. Ribeiro

{"title":"SA-MAIS:股票市场的混合自动情绪分析器","authors":"Bruno Taborda, Ana Maria de Almeida, José Carlos Dias, Fernando Batista, R. Ribeiro","doi":"10.1177/01655515231171361","DOIUrl":null,"url":null,"abstract":"Sentiment analysis of stock-related tweets is a challenging task, not only due to the specificity of the domain but also because of the short nature of the texts. This work proposes SA-MAIS, a two-step lightweight methodology, specially adapted to perform sentiment analysis in domain-constrained short-text messages. To tackle the issue of domain specificity, based on word frequency, the most relevant words are automatically extracted from the new domain and then manually tagged to update an existing domain-specific sentiment lexicon. The sentiment classification is then performed by combining the updated domain-specific lexicon with VADER sentiment analysis, a well-known and widely used sentiment analysis tool. The proposed method is compared with other well-known and widely used sentiment analysis tools, including transformer-based models, such as BERTweet, Twitter-roBERTa and FinBERT, on a domain-specific corpus of stock market-related tweets comprising 1 million messages. The experimental results show that the proposed approach largely surpasses the performance of the other sentiment analysis tools, reaching an overall accuracy of 72.0%. The achieved results highlight the advantage of using a hybrid method that combines domain-specific lexicons with existing generalist tools for the inference of textual sentiment in domain-specific short-text messages.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SA-MAIS: Hybrid automatic sentiment analyser for stock market\",\"authors\":\"Bruno Taborda, Ana Maria de Almeida, José Carlos Dias, Fernando Batista, R. Ribeiro\",\"doi\":\"10.1177/01655515231171361\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis of stock-related tweets is a challenging task, not only due to the specificity of the domain but also because of the short nature of the texts. This work proposes SA-MAIS, a two-step lightweight methodology, specially adapted to perform sentiment analysis in domain-constrained short-text messages. To tackle the issue of domain specificity, based on word frequency, the most relevant words are automatically extracted from the new domain and then manually tagged to update an existing domain-specific sentiment lexicon. The sentiment classification is then performed by combining the updated domain-specific lexicon with VADER sentiment analysis, a well-known and widely used sentiment analysis tool. The proposed method is compared with other well-known and widely used sentiment analysis tools, including transformer-based models, such as BERTweet, Twitter-roBERTa and FinBERT, on a domain-specific corpus of stock market-related tweets comprising 1 million messages. The experimental results show that the proposed approach largely surpasses the performance of the other sentiment analysis tools, reaching an overall accuracy of 72.0%. The achieved results highlight the advantage of using a hybrid method that combines domain-specific lexicons with existing generalist tools for the inference of textual sentiment in domain-specific short-text messages.\",\"PeriodicalId\":54796,\"journal\":{\"name\":\"Journal of Information Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1177/01655515231171361\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/01655515231171361","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

股票相关推文的情感分析是一项具有挑战性的任务，这不仅是因为该领域的特殊性，还因为文本的简短性质。本文提出了一种两步轻量级方法SA-MAIS，特别适用于在领域约束的短文本消息中执行情感分析。为了解决领域专用性问题，基于词频，从新领域中自动提取最相关的词，然后手动标记以更新现有的特定于领域的情感词典。然后，将更新后的特定领域词汇与VADER情感分析相结合，进行情感分类。在包含100万条消息的股票市场相关推文的特定领域语料库上，将所提出的方法与其他知名且广泛使用的情感分析工具(包括基于转换器的模型，如BERTweet, Twitter-roBERTa和FinBERT)进行了比较。实验结果表明，该方法在很大程度上超过了其他情感分析工具的性能，达到了72.0%的总体准确率。所取得的结果突出了使用混合方法的优势，该方法将特定领域的词汇与现有的通才工具相结合，用于特定领域的短文本消息中的文本情感推断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SA-MAIS: Hybrid automatic sentiment analyser for stock market

Sentiment analysis of stock-related tweets is a challenging task, not only due to the specificity of the domain but also because of the short nature of the texts. This work proposes SA-MAIS, a two-step lightweight methodology, specially adapted to perform sentiment analysis in domain-constrained short-text messages. To tackle the issue of domain specificity, based on word frequency, the most relevant words are automatically extracted from the new domain and then manually tagged to update an existing domain-specific sentiment lexicon. The sentiment classification is then performed by combining the updated domain-specific lexicon with VADER sentiment analysis, a well-known and widely used sentiment analysis tool. The proposed method is compared with other well-known and widely used sentiment analysis tools, including transformer-based models, such as BERTweet, Twitter-roBERTa and FinBERT, on a domain-specific corpus of stock market-related tweets comprising 1 million messages. The experimental results show that the proposed approach largely surpasses the performance of the other sentiment analysis tools, reaching an overall accuracy of 72.0%. The achieved results highlight the advantage of using a hybrid method that combines domain-specific lexicons with existing generalist tools for the inference of textual sentiment in domain-specific short-text messages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Information Science 工程技术-计算机：信息系统

CiteScore

6.80

自引率

8.30%

发文量

121

审稿时长

4 months

期刊介绍： The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.