MMCoVaR: multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification

Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Pub Date : 2021-09-14 DOI:10.1145/3487351.3488346

Mingxuan Chen, Xinqiao Chu, K. P. Subbalakshmi

{"title":"MMCoVaR: multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification","authors":"Mingxuan Chen, Xinqiao Chu, K. P. Subbalakshmi","doi":"10.1145/3487351.3488346","DOIUrl":null,"url":null,"abstract":"The outbreak of COVID-19 has resulted in an \"infodemic\" that has encouraged the propagation of misinformation about COVID-19 and cure methods which, in turn, could negatively affect the adoption of recommended public health measures in the larger population. In this paper, we provide a new multimodal (consisting of images, text and temporal information) labeled dataset containing news articles and tweets on the COVID-19 vaccine. We collected 2,593 news articles from 80 publishers for one year between Feb 16th 2020 to May 8th 2021 and 24184 Twitter posts (collected between April 17th 2021 to May 8th 2021). We combine ratings from two news media ranking sites: Medias Bias Chart and Media Bias/Fact Check (MBFC) to classify the news dataset into two levels of credibility: reliable and unreliable. The combination of two filters allows for higher precision of labeling. We also propose a stance detection mechanism to annotate tweets into three levels of credibility: reliable, unreliable and inconclusive. We provide several statistics as well as other analytics like, publisher distribution, publication date distribution, topic analysis, etc. We also provide a novel architecture that classifies the news data into misinformation or truth to provide a baseline performance for this dataset. We find that the proposed architecture has an F-Score of 0.919 and accuracy of 0.882 for fake news detection. Furthermore, we provide benchmark performance for misinformation detection on tweet dataset. This new multimodal dataset can be used in research on COVID-19 vaccine, including misinformation detection, influence of fake COVID-19 vaccine information, etc.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3488346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

The outbreak of COVID-19 has resulted in an "infodemic" that has encouraged the propagation of misinformation about COVID-19 and cure methods which, in turn, could negatively affect the adoption of recommended public health measures in the larger population. In this paper, we provide a new multimodal (consisting of images, text and temporal information) labeled dataset containing news articles and tweets on the COVID-19 vaccine. We collected 2,593 news articles from 80 publishers for one year between Feb 16th 2020 to May 8th 2021 and 24184 Twitter posts (collected between April 17th 2021 to May 8th 2021). We combine ratings from two news media ranking sites: Medias Bias Chart and Media Bias/Fact Check (MBFC) to classify the news dataset into two levels of credibility: reliable and unreliable. The combination of two filters allows for higher precision of labeling. We also propose a stance detection mechanism to annotate tweets into three levels of credibility: reliable, unreliable and inconclusive. We provide several statistics as well as other analytics like, publisher distribution, publication date distribution, topic analysis, etc. We also provide a novel architecture that classifies the news data into misinformation or truth to provide a baseline performance for this dataset. We find that the proposed architecture has an F-Score of 0.919 and accuracy of 0.882 for fake news detection. Furthermore, we provide benchmark performance for misinformation detection on tweet dataset. This new multimodal dataset can be used in research on COVID-19 vaccine, including misinformation detection, influence of fake COVID-19 vaccine information, etc.

查看原文本刊更多论文

MMCoVaR:用于假新闻检测的以COVID-19疫苗为重点的多模式数据存储库和分类基线架构

COVID-19的爆发导致了“信息大流行”，助长了关于COVID-19和治疗方法的错误信息的传播，这反过来又可能对在更大人群中采用建议的公共卫生措施产生负面影响。在本文中，我们提供了一个新的多模态(由图像、文本和时间信息组成)标记数据集，其中包含关于COVID-19疫苗的新闻文章和推文。我们在2020年2月16日至2021年5月8日期间收集了来自80家出版商的2,593篇新闻文章和24184篇Twitter帖子(收集于2021年4月17日至2021年5月8日)。我们结合两个新闻媒体排名网站的评级:媒体偏见图表和媒体偏见/事实检查(MBFC)，将新闻数据集分为两个可信度级别:可靠和不可靠。两个过滤器的组合允许更高的标签精度。我们还提出了一种姿态检测机制，将推文标注为三个可信度级别:可靠、不可靠和不确定。我们提供一些统计数据以及其他分析，如出版商分布，出版日期分布，主题分析等。我们还提供了一种新的架构，将新闻数据分类为错误信息或事实，从而为该数据集提供基准性能。我们发现所提出的架构在假新闻检测上的F-Score为0.919，准确率为0.882。此外，我们还提供了推文数据集错误信息检测的基准性能。该多模态数据集可用于新冠肺炎疫苗的研究，包括错误信息检测、假疫苗信息的影响等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

自引率

0.00%

发文量