Identifying hidden patterns of fake COVID-19 news: An in-depth sentiment analysis and topic modeling approach

Tanvir Ahammad
{"title":"Identifying hidden patterns of fake COVID-19 news: An in-depth sentiment analysis and topic modeling approach","authors":"Tanvir Ahammad","doi":"10.1016/j.nlp.2024.100053","DOIUrl":null,"url":null,"abstract":"<div><p>Spreading misinformation and fake news about COVID-19 has become a critical concern. It contributes to a lack of trust in public health authorities, hinders actions from controlling the virus’s spread, and risks people’s lives. This study aims to gain insights into the types of misinformation spread and develop an in-depth analytical approach for analyzing COVID-19 fake news. It combines the idea of Sentiment Analysis (SA) and Topic Modeling (TM) to improve the accuracy of topic extraction from a large volume of unstructured texts by considering the sentiment of the words. A dataset containing 10,254 news headlines from various sources was collected and prepared, and rule-based SA was applied to label the dataset with three sentiment tags. Among the TM models evaluated, Latent Dirichlet Allocation (LDA) demonstrated the highest coherence score of 0.66 for 20 coherent negative sentiment-based topics and 0.573 for 18 coherent positive fake news topics, outperforming Non-negative Matrix Factorization (NMF) (coherence: 0.43) and Latent Semantic Analysis (LSA) (coherence: 0.40). The topics extracted from the experiments highlight that misinformation primarily revolves around the COVID vaccine, crime, quarantine, medicine, and political and social aspects. This research offers insight into the effects of COVID-19 fake news, provides a valuable method for detecting and analyzing misinformation, and emphasizes the importance of understanding the patterns and themes of fake news for protecting public health and promoting scientific accuracy. Moreover, it can aid in developing real-time monitoring systems to combat misinformation, extending beyond COVID-19-related fake news and enhancing the applicability of the findings.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"6 ","pages":"Article 100053"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000013/pdfft?md5=8f1425dee06c23636d0b5b055c7010af&pid=1-s2.0-S2949719124000013-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Spreading misinformation and fake news about COVID-19 has become a critical concern. It contributes to a lack of trust in public health authorities, hinders actions from controlling the virus’s spread, and risks people’s lives. This study aims to gain insights into the types of misinformation spread and develop an in-depth analytical approach for analyzing COVID-19 fake news. It combines the idea of Sentiment Analysis (SA) and Topic Modeling (TM) to improve the accuracy of topic extraction from a large volume of unstructured texts by considering the sentiment of the words. A dataset containing 10,254 news headlines from various sources was collected and prepared, and rule-based SA was applied to label the dataset with three sentiment tags. Among the TM models evaluated, Latent Dirichlet Allocation (LDA) demonstrated the highest coherence score of 0.66 for 20 coherent negative sentiment-based topics and 0.573 for 18 coherent positive fake news topics, outperforming Non-negative Matrix Factorization (NMF) (coherence: 0.43) and Latent Semantic Analysis (LSA) (coherence: 0.40). The topics extracted from the experiments highlight that misinformation primarily revolves around the COVID vaccine, crime, quarantine, medicine, and political and social aspects. This research offers insight into the effects of COVID-19 fake news, provides a valuable method for detecting and analyzing misinformation, and emphasizes the importance of understanding the patterns and themes of fake news for protecting public health and promoting scientific accuracy. Moreover, it can aid in developing real-time monitoring systems to combat misinformation, extending beyond COVID-19-related fake news and enhancing the applicability of the findings.

识别 COVID-19 假新闻的隐藏模式:深度情感分析和主题建模方法
传播有关 COVID-19 的错误信息和假新闻已成为一个令人严重关切的问题。它导致人们对公共卫生机构缺乏信任,阻碍了控制病毒传播的行动,并危及人们的生命。本研究旨在深入了解错误信息的传播类型,并开发一种深入分析 COVID-19 假新闻的方法。它结合了情感分析(SA)和主题建模(TM)的思想,通过考虑词语的情感来提高从大量非结构化文本中提取主题的准确性。我们收集并准备了一个包含 10,254 个不同来源新闻标题的数据集,并应用基于规则的情感分析为数据集标注了三个情感标签。在接受评估的 TM 模型中,潜在德里赫利特分配(LDA)在 20 个连贯的负面情感主题和 18 个连贯的正面假新闻主题上分别获得了 0.66 和 0.573 的最高连贯度分数,优于非负矩阵因数分解(NMF)(连贯度:0.43)和潜在语义分析(LSA)(连贯度:0.40)。从实验中提取的主题突出表明,错误信息主要围绕 COVID 疫苗、犯罪、检疫、医学以及政治和社会方面。这项研究有助于深入了解 COVID-19 虚假新闻的影响,为检测和分析虚假信息提供了一种有价值的方法,并强调了了解虚假新闻的模式和主题对于保护公众健康和提高科学准确性的重要性。此外,该研究还有助于开发打击虚假信息的实时监测系统,从而超越 COVID-19 相关虚假新闻的范围,提高研究结果的适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信