COCO:新冠肺炎阴谋论的注释推特数据集。

IF 2.3 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of Computational Social Science Pub Date : 2023-04-04 DOI:10.1007/s42001-023-00200-3

Johannes Langguth, Daniel Thilo Schroeder, Petra Filkuková, Stefan Brenner, Jesper Phillips, Konstantin Pogorelov

{"title":"COCO:新冠肺炎阴谋论的注释推特数据集。","authors":"Johannes Langguth, Daniel Thilo Schroeder, Petra Filkuková, Stefan Brenner, Jesper Phillips, Konstantin Pogorelov","doi":"10.1007/s42001-023-00200-3","DOIUrl":null,"url":null,"abstract":"The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":" ","pages":"1-42"},"PeriodicalIF":2.3000,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10071453/pdf/","citationCount":"1","resultStr":"{\"title\":\"COCO: an annotated Twitter dataset of COVID-19 conspiracy theories.\",\"authors\":\"Johannes Langguth, Daniel Thilo Schroeder, Petra Filkuková, Stefan Brenner, Jesper Phillips, Konstantin Pogorelov\",\"doi\":\"10.1007/s42001-023-00200-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.\",\"PeriodicalId\":29946,\"journal\":{\"name\":\"Journal of Computational Social Science\",\"volume\":\" \",\"pages\":\"1-42\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2023-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10071453/pdf/\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Social Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s42001-023-00200-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOCIAL SCIENCES, MATHEMATICAL METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Social Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42001-023-00200-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}

引用次数: 1

摘要

新冠肺炎大流行期间，社交媒体上的错误信息激增，涵盖了广泛的不同话题，并包含了许多相互竞争的叙述，包括阴谋论。为了研究这些阴谋论，我们创建了一个由3495条推文组成的数据集，其中手动标记了每条推文的立场，涉及12个不同的阴谋主题。因此，该数据集包含近42000个标签，每个标签由三位专家注释者中的大多数决定。该数据集是从2020年1月至2021年6月的新冠肺炎相关推特数据中选择的，使用了54个关键词。该数据集可用于单独或同时训练基于机器学习的分类器，用于立场和主题检测。BERT已成功用于组合任务。该数据集还可用于进一步研究不同阴谋叙事的流行情况。为此，我们对推文进行了定性分析，讨论了数据集中经常出现的阴谋叙事的结构。此外，我们还说明了阴谋类别和关键词之间的相互联系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

COCO: an annotated Twitter dataset of COVID-19 conspiracy theories.

查看原文本刊更多论文

COCO: an annotated Twitter dataset of COVID-19 conspiracy theories.

The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Computational Social Science SOCIAL SCIENCES, MATHEMATICAL METHODS-

CiteScore

6.20

自引率

6.20%

发文量