Induction of a sentiment dictionary for financial analyst communication: a data-driven approach balancing machine learning and human intuition

IF 1.6 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Business Analytics Pub Date : 2021-07-19 DOI:10.1080/2573234X.2021.1955022

Matthias Palmer, J. Roeder, Jan Muntermann

{"title":"Induction of a sentiment dictionary for financial analyst communication: a data-driven approach balancing machine learning and human intuition","authors":"Matthias Palmer, J. Roeder, Jan Muntermann","doi":"10.1080/2573234X.2021.1955022","DOIUrl":null,"url":null,"abstract":"ABSTRACT While sentiment dictionaries are easy to apply and provide reproducible results, they often exhibit inferior classification performance compared to machine learning approaches trained for specific application domains. Nevertheless, both approaches typically require manual data analysis. This paper develops a domain-specific dictionary using regularised linear models drawing from textual reports of financial analysts. The first evaluation step demonstrates that the developed financial analyst dictionary can explain cumulative abnormal stock returns related to earnings events more accurately compared to other finance-related dictionaries and sentiment classifiers. In a second step, the approaches are compared using manually annotated sentiment. The financial analyst dictionary is more accurate than other dictionary-based approaches, although it cannot compete with a pre-trained deep learning sentiment classifier. While we show that the proposed approach is suited for texts of financial analysts, it can be applied to other use cases. The approach realises context specificity while reducing extensive manual data analysis.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"39 1","pages":"8 - 28"},"PeriodicalIF":1.6000,"publicationDate":"2021-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234X.2021.1955022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 5

Abstract

ABSTRACT While sentiment dictionaries are easy to apply and provide reproducible results, they often exhibit inferior classification performance compared to machine learning approaches trained for specific application domains. Nevertheless, both approaches typically require manual data analysis. This paper develops a domain-specific dictionary using regularised linear models drawing from textual reports of financial analysts. The first evaluation step demonstrates that the developed financial analyst dictionary can explain cumulative abnormal stock returns related to earnings events more accurately compared to other finance-related dictionaries and sentiment classifiers. In a second step, the approaches are compared using manually annotated sentiment. The financial analyst dictionary is more accurate than other dictionary-based approaches, although it cannot compete with a pre-trained deep learning sentiment classifier. While we show that the proposed approach is suited for texts of financial analysts, it can be applied to other use cases. The approach realises context specificity while reducing extensive manual data analysis.

查看原文本刊更多论文

金融分析师沟通情感词典的归纳:一种平衡机器学习和人类直觉的数据驱动方法

虽然情感词典很容易应用并提供可重复的结果，但与针对特定应用领域训练的机器学习方法相比，它们通常表现出较差的分类性能。然而，这两种方法通常都需要手工数据分析。本文开发了一个领域特定的字典，使用正则化线性模型，从金融分析师的文本报告绘制。第一个评估步骤表明，与其他金融相关字典和情绪分类器相比，开发的金融分析师字典可以更准确地解释与盈余事件相关的累积异常股票收益。在第二步中，使用手动注释的情感对方法进行比较。金融分析师字典比其他基于字典的方法更准确，尽管它无法与预训练的深度学习情感分类器竞争。虽然我们表明所建议的方法适合于金融分析师的文本，但它可以应用于其他用例。该方法实现了上下文特异性，同时减少了大量的人工数据分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Business Analytics Business, Management and Accounting-Management Information Systems

CiteScore

2.50

自引率

0.00%

发文量