利用几何深度学习进行错误信息检测的深度主动学习

Q1 Social Sciences

Online Social Networks and Media Pub Date : 2023-01-01 DOI:10.1016/j.osnem.2023.100244

Giorgio Barnabò , Federico Siciliano , Carlos Castillo , Stefano Leonardi , Preslav Nakov , Giovanni Da San Martino , Fabrizio Silvestri

{"title":"利用几何深度学习进行错误信息检测的深度主动学习","authors":"Giorgio Barnabò , Federico Siciliano , Carlos Castillo , Stefano Leonardi , Preslav Nakov , Giovanni Da San Martino , Fabrizio Silvestri","doi":"10.1016/j.osnem.2023.100244","DOIUrl":null,"url":null,"abstract":"<div>Human fact-checkers currently represent a key component of any semi-automatic misinformation detection pipeline. While current state-of-the-art systems are mostly based on geometric deep-learning models, these architectures still need human-labeled data to be trained and updated — due to shifting topic distributions and adversarial attacks. Most research on automatic misinformation detection, however, neither considers time budget constraints on the number of pieces of news that can be manually fact-checked, nor tries to reduce the burden of fact-checking on – mostly pro bono – annotators and journalists. The first contribution of this work is a thorough analysis of active learning (AL) strategies applied to Graph Neural Networks (GNN) for misinformation detection. Then, based on this analysis, we propose Deep Error Sampling (DES) — a new deep active learning architecture that, when coupled with uncertainty sampling, performs equally or better than the most common AL strategies and the only existing active learning procedure specifically targeting fake news detection. Overall, our experimental results on two benchmark datasets show that all AL strategies outperform random sampling, allowing – on average – to achieve a 2% increase in AUC for the same percentage of third-party fact-checked news and to save up to 25% of labeling effort for a desired level of classification performance. As for DES, while it does not always clearly outperform other strategies, it still reduces variance in the performance between rounds, resulting in a more reliable method. To the best of our knowledge, we are the first to comprehensively study active learning in the context of misinformation detection and to show its potential to reduce the burden of third-party fact-checking without compromising classification performance.</div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"33 ","pages":"Article 100244"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deep active learning for misinformation detection using geometric deep learning\",\"authors\":\"Giorgio Barnabò , Federico Siciliano , Carlos Castillo , Stefano Leonardi , Preslav Nakov , Giovanni Da San Martino , Fabrizio Silvestri\",\"doi\":\"10.1016/j.osnem.2023.100244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>Human fact-checkers currently represent a key component of any semi-automatic misinformation detection pipeline. While current state-of-the-art systems are mostly based on geometric deep-learning models, these architectures still need human-labeled data to be trained and updated — due to shifting topic distributions and adversarial attacks. Most research on automatic misinformation detection, however, neither considers time budget constraints on the number of pieces of news that can be manually fact-checked, nor tries to reduce the burden of fact-checking on – mostly pro bono – annotators and journalists. The first contribution of this work is a thorough analysis of active learning (AL) strategies applied to Graph Neural Networks (GNN) for misinformation detection. Then, based on this analysis, we propose Deep Error Sampling (DES) — a new deep active learning architecture that, when coupled with uncertainty sampling, performs equally or better than the most common AL strategies and the only existing active learning procedure specifically targeting fake news detection. Overall, our experimental results on two benchmark datasets show that all AL strategies outperform random sampling, allowing – on average – to achieve a 2% increase in AUC for the same percentage of third-party fact-checked news and to save up to 25% of labeling effort for a desired level of classification performance. As for DES, while it does not always clearly outperform other strategies, it still reduces variance in the performance between rounds, resulting in a more reliable method. To the best of our knowledge, we are the first to comprehensively study active learning in the context of misinformation detection and to show its potential to reduce the burden of third-party fact-checking without compromising classification performance.</div>\",\"PeriodicalId\":52228,\"journal\":{\"name\":\"Online Social Networks and Media\",\"volume\":\"33 \",\"pages\":\"Article 100244\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Online Social Networks and Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468696423000034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696423000034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 1

摘要

人工事实检查员目前是任何半自动错误信息检测管道的关键组成部分。虽然目前最先进的系统主要基于几何深度学习模型，但由于主题分布的变化和对抗性攻击，这些架构仍然需要人工标记的数据进行训练和更新。然而，大多数关于错误信息自动检测的研究，既没有考虑到可以手工核实事实的新闻片段数量的时间预算限制，也没有试图减轻注释者和记者(主要是无偿的)核实事实的负担。这项工作的第一个贡献是对应用于图神经网络(GNN)进行错误信息检测的主动学习(AL)策略的全面分析。然后，在此分析的基础上，我们提出了深度误差采样(DES)——一种新的深度主动学习架构，当与不确定性采样相结合时，它的性能与最常见的人工智能策略和现有的唯一专门针对假新闻检测的主动学习过程一样或更好。总体而言，我们在两个基准数据集上的实验结果表明，所有人工智能策略的性能都优于随机抽样，平均而言，对于相同百分比的第三方事实检查新闻，AUC可以提高2%，并且可以节省高达25%的标记工作，以达到所需的分类性能水平。对于DES，虽然它并不总是明显优于其他策略，但它仍然减少了轮与轮之间的性能差异，从而使方法更加可靠。据我们所知，我们是第一个在错误信息检测的背景下全面研究主动学习的人，并展示了它在不影响分类性能的情况下减轻第三方事实核查负担的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Deep active learning for misinformation detection using geometric deep learning

查看原文本刊更多论文

Deep active learning for misinformation detection using geometric deep learning

Human fact-checkers currently represent a key component of any semi-automatic misinformation detection pipeline. While current state-of-the-art systems are mostly based on geometric deep-learning models, these architectures still need human-labeled data to be trained and updated — due to shifting topic distributions and adversarial attacks. Most research on automatic misinformation detection, however, neither considers time budget constraints on the number of pieces of news that can be manually fact-checked, nor tries to reduce the burden of fact-checking on – mostly pro bono – annotators and journalists. The first contribution of this work is a thorough analysis of active learning (AL) strategies applied to Graph Neural Networks (GNN) for misinformation detection. Then, based on this analysis, we propose Deep Error Sampling (DES) — a new deep active learning architecture that, when coupled with uncertainty sampling, performs equally or better than the most common AL strategies and the only existing active learning procedure specifically targeting fake news detection. Overall, our experimental results on two benchmark datasets show that all AL strategies outperform random sampling, allowing – on average – to achieve a 2% increase in AUC for the same percentage of third-party fact-checked news and to save up to 25% of labeling effort for a desired level of classification performance. As for DES, while it does not always clearly outperform other strategies, it still reduces variance in the performance between rounds, resulting in a more reliable method. To the best of our knowledge, we are the first to comprehensively study active learning in the context of misinformation detection and to show its potential to reduce the burden of third-party fact-checking without compromising classification performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Online Social Networks and Media Social Sciences-Communication

CiteScore

10.60

自引率

0.00%

发文量

审稿时长

44 days