不仅仅是一种感觉:情绪分析准确性的基准

Mark Heitmann, Christian Siebert, Jochen Hartmann, Christina Schamp
{"title":"不仅仅是一种感觉:情绪分析准确性的基准","authors":"Mark Heitmann, Christian Siebert, Jochen Hartmann, Christina Schamp","doi":"10.2139/ssrn.3489963","DOIUrl":null,"url":null,"abstract":"The written word is the oldest and most common type of data. Today, mass literacy and cheap technology allow for greater word output per capita than ever before in human history. To keep pace, companies and scholars increasingly depend on automated analyses — not only of what people say (content) but also how they feel (sentiment). This makes it pertinent to understand the accuracy of these automated analyses. While information systems research has produced remarkable leaps of progress, the emphasis has been on innovation rather than evaluation. From an applied perspective, it is not clear whether leaderboard results for selected problems generalize across data sets and domains. In this article, we focus on sentiment analysis methods and assess performance across applications by combining a meta-analysis of 216 comparative computer science publications on 271 unique data sets with experimental evaluations of novel language models. To the best of our knowledge, this constitutes the most comprehensive assessment of sentiment analysis accuracy to date. We find that method choice explains only 10% of the variance in accuracy. Controlling for contextual factors such as data set and paper characteristics increases explanatory power to over 75%, suggesting differences across research problems matter. We find that accuracy of sentiment analysis can indeed approach 95% but can also fall below 50%. This shows that more nuanced benchmarks, rather than best attainable values for selected use cases, are more meaningful for an applied audience. We compute benchmark values that take both methodological choices and application context into account.","PeriodicalId":301794,"journal":{"name":"Communication & Computational Methods eJournal","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"More than a Feeling: Benchmarks for Sentiment Analysis Accuracy\",\"authors\":\"Mark Heitmann, Christian Siebert, Jochen Hartmann, Christina Schamp\",\"doi\":\"10.2139/ssrn.3489963\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The written word is the oldest and most common type of data. Today, mass literacy and cheap technology allow for greater word output per capita than ever before in human history. To keep pace, companies and scholars increasingly depend on automated analyses — not only of what people say (content) but also how they feel (sentiment). This makes it pertinent to understand the accuracy of these automated analyses. While information systems research has produced remarkable leaps of progress, the emphasis has been on innovation rather than evaluation. From an applied perspective, it is not clear whether leaderboard results for selected problems generalize across data sets and domains. In this article, we focus on sentiment analysis methods and assess performance across applications by combining a meta-analysis of 216 comparative computer science publications on 271 unique data sets with experimental evaluations of novel language models. To the best of our knowledge, this constitutes the most comprehensive assessment of sentiment analysis accuracy to date. We find that method choice explains only 10% of the variance in accuracy. Controlling for contextual factors such as data set and paper characteristics increases explanatory power to over 75%, suggesting differences across research problems matter. We find that accuracy of sentiment analysis can indeed approach 95% but can also fall below 50%. This shows that more nuanced benchmarks, rather than best attainable values for selected use cases, are more meaningful for an applied audience. We compute benchmark values that take both methodological choices and application context into account.\",\"PeriodicalId\":301794,\"journal\":{\"name\":\"Communication & Computational Methods eJournal\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communication & Computational Methods eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3489963\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communication & Computational Methods eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3489963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

摘要

书面文字是最古老、最常见的数据类型。今天,大众识字率和廉价技术使得人均字数比人类历史上任何时候都要多。为了跟上时代的步伐,公司和学者越来越依赖于自动化分析——不仅是人们说了什么(内容),还有他们的感受(情绪)。这使得理解这些自动化分析的准确性变得非常重要。虽然信息系统研究取得了显著的飞跃式进展,但重点一直放在创新而不是评价上。从应用的角度来看,对于所选问题的排行榜结果是否在数据集和领域中具有普遍性尚不清楚。在本文中,我们将重点放在情感分析方法上,并通过结合对271个独特数据集的216个比较计算机科学出版物的荟萃分析和对新语言模型的实验评估来评估跨应用程序的性能。据我们所知,这构成了迄今为止最全面的情感分析准确性评估。我们发现方法选择只能解释10%的准确度差异。控制数据集和论文特征等背景因素将解释力提高到75%以上,这表明研究问题之间的差异很重要。我们发现情绪分析的准确性确实可以接近95%,但也可能低于50%。这表明,对于应用程序的受众来说,更细微的基准,而不是所选用例的最佳可实现值,更有意义。我们计算的基准值同时考虑了方法选择和应用程序上下文。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
More than a Feeling: Benchmarks for Sentiment Analysis Accuracy
The written word is the oldest and most common type of data. Today, mass literacy and cheap technology allow for greater word output per capita than ever before in human history. To keep pace, companies and scholars increasingly depend on automated analyses — not only of what people say (content) but also how they feel (sentiment). This makes it pertinent to understand the accuracy of these automated analyses. While information systems research has produced remarkable leaps of progress, the emphasis has been on innovation rather than evaluation. From an applied perspective, it is not clear whether leaderboard results for selected problems generalize across data sets and domains. In this article, we focus on sentiment analysis methods and assess performance across applications by combining a meta-analysis of 216 comparative computer science publications on 271 unique data sets with experimental evaluations of novel language models. To the best of our knowledge, this constitutes the most comprehensive assessment of sentiment analysis accuracy to date. We find that method choice explains only 10% of the variance in accuracy. Controlling for contextual factors such as data set and paper characteristics increases explanatory power to over 75%, suggesting differences across research problems matter. We find that accuracy of sentiment analysis can indeed approach 95% but can also fall below 50%. This shows that more nuanced benchmarks, rather than best attainable values for selected use cases, are more meaningful for an applied audience. We compute benchmark values that take both methodological choices and application context into account.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信