The adaptive community-response (ACR) method for collecting misinformation on social media

IF 6.4 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data Pub Date : 2024-02-24 DOI:10.1186/s40537-024-00894-w

Julian Kauk, Helene Kreysa, André Scherag, Stefan R. Schweinberger

{"title":"The adaptive community-response (ACR) method for collecting misinformation on social media","authors":"Julian Kauk, Helene Kreysa, André Scherag, Stefan R. Schweinberger","doi":"10.1186/s40537-024-00894-w","DOIUrl":null,"url":null,"abstract":"Social media can be a major accelerator of the spread of misinformation, thereby potentially compromising both individual well-being and social cohesion. Despite significant recent advances, the study of online misinformation is a relatively young field facing several (methodological) challenges. In this regard, the detection of online misinformation has proven difficult, as online large-scale data streams require (semi-)automated, highly specific and therefore sophisticated methods to separate posts containing misinformation from irrelevant posts. In the present paper, we introduce the adaptive community-response (ACR) method, an unsupervised technique for the large-scale collection of misinformation on Twitter (now known as ’X’). The ACR method is based on previous findings showing that Twitter users occasionally reply to misinformation with fact-checking by referring to specific fact-checking sites (crowdsourced fact-checking). In a first step, we captured such misinforming but fact-checked tweets. These tweets were used in a second step to extract specific linguistic features (keywords), enabling us to collect also those misinforming tweets that were not fact-checked at all as a third step. We initially present a mathematical framework of our method, followed by an explicit algorithmic implementation. We then evaluate ACR on the basis of a comprehensive dataset consisting of \\(>25\\) million tweets, belonging to \\(>300\\) misinforming stories. Our evaluation shows that ACR is a useful extension to the methods pool of the field, enabling researchers to collect online misinformation more comprehensively. Text similarity measures clearly indicated correspondence between the claims of false stories and the ACR tweets, even though ACR performance was heterogeneously distributed across the stories. A baseline comparison to the fact-checked tweets showed that the ACR method can detect story-related tweets to a comparable degree, while being sensitive to different types of tweets: Fact-checked tweets tend to be driven by high outreach (as indicated by a high number of retweets), whereas the sensitivity of the ACR method extends to tweets exhibiting lower outreach. Taken together, ACR’s capacity as a valuable methodological contribution to the field is based on (i) the adoption of prior, pioneering research in the field, (ii) a well-formalized mathematical framework and (iii) an empirical foundation via a comprehensive set of indicators.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"130 1","pages":""},"PeriodicalIF":6.4000,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00894-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Social media can be a major accelerator of the spread of misinformation, thereby potentially compromising both individual well-being and social cohesion. Despite significant recent advances, the study of online misinformation is a relatively young field facing several (methodological) challenges. In this regard, the detection of online misinformation has proven difficult, as online large-scale data streams require (semi-)automated, highly specific and therefore sophisticated methods to separate posts containing misinformation from irrelevant posts. In the present paper, we introduce the adaptive community-response (ACR) method, an unsupervised technique for the large-scale collection of misinformation on Twitter (now known as ’X’). The ACR method is based on previous findings showing that Twitter users occasionally reply to misinformation with fact-checking by referring to specific fact-checking sites (crowdsourced fact-checking). In a first step, we captured such misinforming but fact-checked tweets. These tweets were used in a second step to extract specific linguistic features (keywords), enabling us to collect also those misinforming tweets that were not fact-checked at all as a third step. We initially present a mathematical framework of our method, followed by an explicit algorithmic implementation. We then evaluate ACR on the basis of a comprehensive dataset consisting of \(>25\) million tweets, belonging to \(>300\) misinforming stories. Our evaluation shows that ACR is a useful extension to the methods pool of the field, enabling researchers to collect online misinformation more comprehensively. Text similarity measures clearly indicated correspondence between the claims of false stories and the ACR tweets, even though ACR performance was heterogeneously distributed across the stories. A baseline comparison to the fact-checked tweets showed that the ACR method can detect story-related tweets to a comparable degree, while being sensitive to different types of tweets: Fact-checked tweets tend to be driven by high outreach (as indicated by a high number of retweets), whereas the sensitivity of the ACR method extends to tweets exhibiting lower outreach. Taken together, ACR’s capacity as a valuable methodological contribution to the field is based on (i) the adoption of prior, pioneering research in the field, (ii) a well-formalized mathematical framework and (iii) an empirical foundation via a comprehensive set of indicators.

Abstract Image

查看原文本刊更多论文

在社交媒体上收集错误信息的自适应社区响应（ACR）方法

社交媒体可以成为错误信息传播的主要加速器，从而可能损害个人福祉和社会凝聚力。尽管最近取得了重大进展，但网络虚假信息研究仍是一个相对年轻的领域，面临着若干（方法论）挑战。在这方面，网上误导信息的检测已被证明是困难的，因为网上的大规模数据流需要（半）自动化、高度特异性和复杂的方法，才能将包含误导信息的帖子从无关帖子中分离出来。在本文中，我们介绍了自适应社区响应（ACR）方法，这是一种用于大规模收集 Twitter（现称为 "X"）上的错误信息的无监督技术。ACR 方法基于之前的研究结果，这些结果表明，Twitter 用户偶尔会通过引用特定的事实核查网站（众包事实核查）对错误信息进行事实核查回复。第一步，我们捕捉到了这类错误信息但经过事实核查的推文。第二步，我们利用这些推文提取特定的语言特征（关键词），从而在第三步也能收集到那些根本没有经过事实核查的错误信息推文。我们首先介绍了我们方法的数学框架，然后是明确的算法实现。然后，我们在一个综合数据集的基础上对 ACR 进行了评估，该数据集由 \(>25\) 百万条推文组成，属于 \(>300\) 篇误导性报道。我们的评估结果表明，ACR 是对该领域方法库的有益扩展，使研究人员能够更全面地收集网络错误信息。文本相似性测量清楚地表明了虚假报道的说法与 ACR 推文之间的对应关系，尽管 ACR 的性能在不同的报道中分布不均。与事实核查推文的基线比较显示，ACR 方法可以在相当程度上检测到与故事相关的推文，同时对不同类型的推文也很敏感：事实校验推文往往由高转发量（如高转发量所示）驱动，而 ACR 方法的灵敏度则延伸至表现出较低转发量的推文。综上所述，ACR 作为对该领域有价值的方法论贡献的能力是基于：（1）采用了该领域先前的开创性研究；（2）完善的正规化数学框架；（3）通过一套全面的指标奠定了实证基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Big Data Computer Science-Information Systems

CiteScore

17.80

自引率

3.70%

发文量

105

审稿时长

13 weeks

期刊介绍： The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.