Misinformation detection on online social networks using pretrained language models

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-08-01 DOI:10.1016/j.ipm.2025.104342

Pir Noman Ahmad , Adnan Muhammad Shah , KangYoon Lee , Wazir Muhammad

{"title":"Misinformation detection on online social networks using pretrained language models","authors":"Pir Noman Ahmad , Adnan Muhammad Shah , KangYoon Lee , Wazir Muhammad","doi":"10.1016/j.ipm.2025.104342","DOIUrl":null,"url":null,"abstract":"<div><div>The growing prevalence of online misinformation poses substantial threats, with notable examples including the undermined integrity of democratic processes and decreased effectiveness of public health efforts. The effectiveness of existing solutions, such as user education and content removal, remains unclear, primarily because confirmation bias and peer pressure hinder the identification of noncredible information by users. To address these challenges posed by online misinformation, this study proposes a state-of-the-art approach that leverages transformer-based models, including bidirectional encoder representation from transformers (BERT), GPT-2, and XLNet. These models leverage attention mechanisms to simultaneously process and capture contextual subtleties in documents, enabling highly accurate misinformation detection and classification in dynamic and complex online narratives. A transformer-based pretrained language model is used to analyze, a large corpus of tweets related to misinformation events concerning the 2020 U.S. election. Although isolated interventions are found to be ineffective, a synergistic approach is shown to reduce misinformation prevalence by 87.9 % within a 40-min delay based on a credibility interval of 80 %. These findings highlight the potential of empirical models to inform policies, enhance content moderation practices, and strengthen public resilience against misinformation.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104342"},"PeriodicalIF":6.9000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002833","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The growing prevalence of online misinformation poses substantial threats, with notable examples including the undermined integrity of democratic processes and decreased effectiveness of public health efforts. The effectiveness of existing solutions, such as user education and content removal, remains unclear, primarily because confirmation bias and peer pressure hinder the identification of noncredible information by users. To address these challenges posed by online misinformation, this study proposes a state-of-the-art approach that leverages transformer-based models, including bidirectional encoder representation from transformers (BERT), GPT-2, and XLNet. These models leverage attention mechanisms to simultaneously process and capture contextual subtleties in documents, enabling highly accurate misinformation detection and classification in dynamic and complex online narratives. A transformer-based pretrained language model is used to analyze, a large corpus of tweets related to misinformation events concerning the 2020 U.S. election. Although isolated interventions are found to be ineffective, a synergistic approach is shown to reduce misinformation prevalence by 87.9 % within a 40-min delay based on a credibility interval of 80 %. These findings highlight the potential of empirical models to inform policies, enhance content moderation practices, and strengthen public resilience against misinformation.

查看原文本刊更多论文

基于预训练语言模型的在线社交网络错误信息检测

网上错误信息的日益流行构成了重大威胁，显著的例子包括破坏民主进程的完整性和降低公共卫生工作的效力。现有解决方案的有效性，如用户教育和内容删除，仍然不清楚，主要是因为确认偏见和同伴压力阻碍了用户对不可信信息的识别。为了解决这些在线错误信息带来的挑战，本研究提出了一种最先进的方法，利用基于变压器的模型，包括来自变压器（BERT）、GPT-2和XLNet的双向编码器表示。这些模型利用注意力机制同时处理和捕捉文档中的上下文微妙之处，从而在动态和复杂的在线叙述中实现高度准确的错误信息检测和分类。使用基于转换器的预训练语言模型来分析与2020年美国大选错误信息事件相关的大量推文语料库。虽然发现孤立的干预措施无效，但基于80%的可信区间，协同方法显示在40分钟延迟内将错误信息发生率降低87.9%。这些发现突出了经验模型在为政策提供信息、加强内容审核实践和加强公众对错误信息的抵御能力方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.