Weakly Supervised Learning for Fake News Detection on Twitter

2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Pub Date : 2018-08-01 DOI:10.1109/ASONAM.2018.8508520

Stefan Helmstetter, Heiko Paulheim

{"title":"Weakly Supervised Learning for Fake News Detection on Twitter","authors":"Stefan Helmstetter, Heiko Paulheim","doi":"10.1109/ASONAM.2018.8508520","DOIUrl":null,"url":null,"abstract":"The problem of automatic detection of fake news in social media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded as a straight-forward, binary classification problem, the major challenge is the collection of large enough training corpora, since manual annotation of tweets as fake or non-fake news is an expensive and tedious endeavor. In this paper, we discuss a weakly supervised approach, which automatically collects a large-scale, but very noisy training dataset comprising hundreds of thousands of tweets. During collection, we automatically label tweets by their source, i.e., trustworthy or untrustworthy source, and train a classifier on this dataset. We then use that classifier for a different classification target, i.e., the classification of fake and non-fake tweets. Although the labels are not accurate according to the new classification target (not all tweets by an untrustworthy source need to be fake news, and vice versa), we show that despite this unclean inaccurate dataset, it is possible to detect fake news with an F1 score of up to 0.9.","PeriodicalId":135949,"journal":{"name":"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"134","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM.2018.8508520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 134

Abstract

The problem of automatic detection of fake news in social media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded as a straight-forward, binary classification problem, the major challenge is the collection of large enough training corpora, since manual annotation of tweets as fake or non-fake news is an expensive and tedious endeavor. In this paper, we discuss a weakly supervised approach, which automatically collects a large-scale, but very noisy training dataset comprising hundreds of thousands of tweets. During collection, we automatically label tweets by their source, i.e., trustworthy or untrustworthy source, and train a classifier on this dataset. We then use that classifier for a different classification target, i.e., the classification of fake and non-fake tweets. Although the labels are not accurate according to the new classification target (not all tweets by an untrustworthy source need to be fake news, and vice versa), we show that despite this unclean inaccurate dataset, it is possible to detect fake news with an F1 score of up to 0.9.

查看原文本刊更多论文

微博假新闻检测的弱监督学习

社交媒体(如Twitter)上的假新闻自动检测问题最近引起了一些关注。虽然从技术角度来看，它可以被视为一个直接的二元分类问题，但主要的挑战是收集足够大的训练语料库，因为手动将tweet标注为假新闻或非假新闻是一项昂贵且乏味的工作。在本文中，我们讨论了一种弱监督方法，该方法自动收集包含数十万条推文的大规模但非常嘈杂的训练数据集。在收集过程中，我们根据tweet的来源(即可信或不可信的来源)自动标记tweet，并在此数据集上训练分类器。然后，我们将该分类器用于不同的分类目标，即假和非假推文的分类。尽管根据新的分类目标，标签并不准确(并非所有来自不可信来源的推文都需要是假新闻，反之亦然)，但我们表明，尽管这个不干净的不准确数据集，仍有可能检测到F1得分高达0.9的假新闻。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

自引率

0.00%

发文量