The advantages of lexicon-based sentiment analysis in an age of machine learning.

IF 2.6 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
PLoS ONE Pub Date : 2025-01-10 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0313092
A Maurits van der Veen, Erik Bleich
{"title":"The advantages of lexicon-based sentiment analysis in an age of machine learning.","authors":"A Maurits van der Veen, Erik Bleich","doi":"10.1371/journal.pone.0313092","DOIUrl":null,"url":null,"abstract":"<p><p>Assessing whether texts are positive or negative-sentiment analysis-has wide-ranging applications across many disciplines. Automated approaches make it possible to code near unlimited quantities of texts rapidly, replicably, and with high accuracy. Compared to machine learning and large language model (LLM) approaches, lexicon-based methods may sacrifice some in performance, but in exchange they provide generalizability and domain independence, while crucially offering the possibility of identifying gradations in sentiment. We demonstrate the strong performance of lexica using MultiLexScaled, an approach which averages valences across a number of widely-used general-purpose lexica. We validate it against benchmark datasets from a range of different domains, comparing performance against machine learning and LLM alternatives. In addition, we illustrate the value of identifying fine-grained sentiment levels by showing, in an analysis of pre- and post-9/11 British press coverage of Muslims, that binarized valence metrics give rise to different (and erroneous) conclusions about the nature of the post-9/11 shock as well as about differences between broadsheet and tabloid coverage. The code to apply MultiLexScaled is available online.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 1","pages":"e0313092"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11723603/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0313092","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Assessing whether texts are positive or negative-sentiment analysis-has wide-ranging applications across many disciplines. Automated approaches make it possible to code near unlimited quantities of texts rapidly, replicably, and with high accuracy. Compared to machine learning and large language model (LLM) approaches, lexicon-based methods may sacrifice some in performance, but in exchange they provide generalizability and domain independence, while crucially offering the possibility of identifying gradations in sentiment. We demonstrate the strong performance of lexica using MultiLexScaled, an approach which averages valences across a number of widely-used general-purpose lexica. We validate it against benchmark datasets from a range of different domains, comparing performance against machine learning and LLM alternatives. In addition, we illustrate the value of identifying fine-grained sentiment levels by showing, in an analysis of pre- and post-9/11 British press coverage of Muslims, that binarized valence metrics give rise to different (and erroneous) conclusions about the nature of the post-9/11 shock as well as about differences between broadsheet and tabloid coverage. The code to apply MultiLexScaled is available online.

Abstract Image

Abstract Image

Abstract Image

评估文本是积极的还是消极的——情绪分析——在许多学科中都有广泛的应用。自动化方法使得快速、可复制和高精度地编写几乎无限量的文本成为可能。与机器学习和大型语言模型(LLM)方法相比,基于词典的方法可能会牺牲一些性能,但作为交换,它们提供了泛化性和领域独立性,同时重要的是提供了识别情感层次的可能性。我们使用MultiLexScaled展示了lexica的强大性能,MultiLexScaled是一种对许多广泛使用的通用词典进行平均化的方法。我们对来自不同领域的基准数据集进行了验证,并将性能与机器学习和LLM替代方案进行了比较。此外,通过对9/11前后英国媒体对穆斯林的报道进行分析,我们说明了识别细粒度情绪水平的价值,二值化的效价指标对9/11后冲击的性质以及大报和小报报道之间的差异产生了不同(和错误)的结论。应用MultiLexScaled的代码可在网上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
PLoS ONE
PLoS ONE 生物-生物学
CiteScore
6.20
自引率
5.40%
发文量
14242
审稿时长
3.7 months
期刊介绍: PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信