不仅仅是一种感觉:情绪分析准确性的基准

Communication & Computational Methods eJournal Pub Date : 2020-07-31 DOI:10.2139/ssrn.3489963

Mark Heitmann, Christian Siebert, Jochen Hartmann, Christina Schamp

{"title":"不仅仅是一种感觉:情绪分析准确性的基准","authors":"Mark Heitmann, Christian Siebert, Jochen Hartmann, Christina Schamp","doi":"10.2139/ssrn.3489963","DOIUrl":null,"url":null,"abstract":"The written word is the oldest and most common type of data. Today, mass literacy and cheap technology allow for greater word output per capita than ever before in human history. To keep pace, companies and scholars increasingly depend on automated analyses — not only of what people say (content) but also how they feel (sentiment). This makes it pertinent to understand the accuracy of these automated analyses. While information systems research has produced remarkable leaps of progress, the emphasis has been on innovation rather than evaluation. From an applied perspective, it is not clear whether leaderboard results for selected problems generalize across data sets and domains. In this article, we focus on sentiment analysis methods and assess performance across applications by combining a meta-analysis of 216 comparative computer science publications on 271 unique data sets with experimental evaluations of novel language models. To the best of our knowledge, this constitutes the most comprehensive assessment of sentiment analysis accuracy to date. We find that method choice explains only 10% of the variance in accuracy. Controlling for contextual factors such as data set and paper characteristics increases explanatory power to over 75%, suggesting differences across research problems matter. We find that accuracy of sentiment analysis can indeed approach 95% but can also fall below 50%. This shows that more nuanced benchmarks, rather than best attainable values for selected use cases, are more meaningful for an applied audience. We compute benchmark values that take both methodological choices and application context into account.","PeriodicalId":301794,"journal":{"name":"Communication & Computational Methods eJournal","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"More than a Feeling: Benchmarks for Sentiment Analysis Accuracy\",\"authors\":\"Mark Heitmann, Christian Siebert, Jochen Hartmann, Christina Schamp\",\"doi\":\"10.2139/ssrn.3489963\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The written word is the oldest and most common type of data. Today, mass literacy and cheap technology allow for greater word output per capita than ever before in human history. To keep pace, companies and scholars increasingly depend on automated analyses — not only of what people say (content) but also how they feel (sentiment). This makes it pertinent to understand the accuracy of these automated analyses. While information systems research has produced remarkable leaps of progress, the emphasis has been on innovation rather than evaluation. From an applied perspective, it is not clear whether leaderboard results for selected problems generalize across data sets and domains. In this article, we focus on sentiment analysis methods and assess performance across applications by combining a meta-analysis of 216 comparative computer science publications on 271 unique data sets with experimental evaluations of novel language models. To the best of our knowledge, this constitutes the most comprehensive assessment of sentiment analysis accuracy to date. We find that method choice explains only 10% of the variance in accuracy. Controlling for contextual factors such as data set and paper characteristics increases explanatory power to over 75%, suggesting differences across research problems matter. We find that accuracy of sentiment analysis can indeed approach 95% but can also fall below 50%. This shows that more nuanced benchmarks, rather than best attainable values for selected use cases, are more meaningful for an applied audience. We compute benchmark values that take both methodological choices and application context into account.\",\"PeriodicalId\":301794,\"journal\":{\"name\":\"Communication & Computational Methods eJournal\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communication & Computational Methods eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3489963\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communication & Computational Methods eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3489963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

书面文字是最古老、最常见的数据类型。今天，大众识字率和廉价技术使得人均字数比人类历史上任何时候都要多。为了跟上时代的步伐，公司和学者越来越依赖于自动化分析——不仅是人们说了什么(内容)，还有他们的感受(情绪)。这使得理解这些自动化分析的准确性变得非常重要。虽然信息系统研究取得了显著的飞跃式进展，但重点一直放在创新而不是评价上。从应用的角度来看，对于所选问题的排行榜结果是否在数据集和领域中具有普遍性尚不清楚。在本文中，我们将重点放在情感分析方法上，并通过结合对271个独特数据集的216个比较计算机科学出版物的荟萃分析和对新语言模型的实验评估来评估跨应用程序的性能。据我们所知，这构成了迄今为止最全面的情感分析准确性评估。我们发现方法选择只能解释10%的准确度差异。控制数据集和论文特征等背景因素将解释力提高到75%以上，这表明研究问题之间的差异很重要。我们发现情绪分析的准确性确实可以接近95%，但也可能低于50%。这表明，对于应用程序的受众来说，更细微的基准，而不是所选用例的最佳可实现值，更有意义。我们计算的基准值同时考虑了方法选择和应用程序上下文。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

More than a Feeling: Benchmarks for Sentiment Analysis Accuracy

The written word is the oldest and most common type of data. Today, mass literacy and cheap technology allow for greater word output per capita than ever before in human history. To keep pace, companies and scholars increasingly depend on automated analyses — not only of what people say (content) but also how they feel (sentiment). This makes it pertinent to understand the accuracy of these automated analyses. While information systems research has produced remarkable leaps of progress, the emphasis has been on innovation rather than evaluation. From an applied perspective, it is not clear whether leaderboard results for selected problems generalize across data sets and domains. In this article, we focus on sentiment analysis methods and assess performance across applications by combining a meta-analysis of 216 comparative computer science publications on 271 unique data sets with experimental evaluations of novel language models. To the best of our knowledge, this constitutes the most comprehensive assessment of sentiment analysis accuracy to date. We find that method choice explains only 10% of the variance in accuracy. Controlling for contextual factors such as data set and paper characteristics increases explanatory power to over 75%, suggesting differences across research problems matter. We find that accuracy of sentiment analysis can indeed approach 95% but can also fall below 50%. This shows that more nuanced benchmarks, rather than best attainable values for selected use cases, are more meaningful for an applied audience. We compute benchmark values that take both methodological choices and application context into account.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Communication & Computational Methods eJournal

自引率

0.00%

发文量