Classifying Fake and Real Neurally Generated News

2021 Swedish Workshop on Data Science (SweDS) Pub Date : 2021-12-02 DOI:10.1109/SweDS53855.2021.9638268

Anitha Govindaraju, J. Griffith

{"title":"Classifying Fake and Real Neurally Generated News","authors":"Anitha Govindaraju, J. Griffith","doi":"10.1109/SweDS53855.2021.9638268","DOIUrl":null,"url":null,"abstract":"In this data era, with Natural Language Processing (NLP) techniques such as “Language Modelling” showing great progress, it is observed that the idea of “Automated Journalism” i.e., generating news articles using computer programs based on existing news headlines, or the body of a news article, is emerging. Such advancements not only lead to progress but also to certain disadvantages. Specifically, adversaries are using these techniques to create fake news articles called “Neural fake news”. Such news imitates the style and appearance of real news to generate targeted propaganda which is used to confuse people. Humans find this neural fake news to be more trustworthy than human- written disinformation [1]. The goal of this research is to classify various types of neurally generated news as real or fake based on its genuineness. In a real world scenario, humans evaluate the genuineness of news by relying on a model of the world, i.e., evaluating whether the content in the news is the same as the content from a reliable news source (e.g., Associated Press). In this work we use a Recurrent Neural Network (RNN), specifically a Siamese Bi-directional LSTM (BiLSTM), to act as a Semantic Textual Similarity (STS) model which compares the real news with neural news to determine whether it is fake or not. In order to train and test the model, 3 datasets have been created: One containing real news extracted from a common crawl; the second comprises a neural fake news dataset generated using language modelling techniques; the third comprises a neural real news dataset generated using textual data augmentation techniques. It is found that the Siamese BiLSTM model can accurately find the similarity scores between real news and neural news to allow the neural news to be classified as neural real or neural fake news.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Swedish Workshop on Data Science (SweDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SweDS53855.2021.9638268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this data era, with Natural Language Processing (NLP) techniques such as “Language Modelling” showing great progress, it is observed that the idea of “Automated Journalism” i.e., generating news articles using computer programs based on existing news headlines, or the body of a news article, is emerging. Such advancements not only lead to progress but also to certain disadvantages. Specifically, adversaries are using these techniques to create fake news articles called “Neural fake news”. Such news imitates the style and appearance of real news to generate targeted propaganda which is used to confuse people. Humans find this neural fake news to be more trustworthy than human- written disinformation [1]. The goal of this research is to classify various types of neurally generated news as real or fake based on its genuineness. In a real world scenario, humans evaluate the genuineness of news by relying on a model of the world, i.e., evaluating whether the content in the news is the same as the content from a reliable news source (e.g., Associated Press). In this work we use a Recurrent Neural Network (RNN), specifically a Siamese Bi-directional LSTM (BiLSTM), to act as a Semantic Textual Similarity (STS) model which compares the real news with neural news to determine whether it is fake or not. In order to train and test the model, 3 datasets have been created: One containing real news extracted from a common crawl; the second comprises a neural fake news dataset generated using language modelling techniques; the third comprises a neural real news dataset generated using textual data augmentation techniques. It is found that the Siamese BiLSTM model can accurately find the similarity scores between real news and neural news to allow the neural news to be classified as neural real or neural fake news.

查看原文本刊更多论文

分类假的和真实的神经生成新闻

在这个数据时代，随着“语言建模”等自然语言处理(NLP)技术的巨大进步，人们观察到“自动化新闻”的想法正在出现，即使用基于现有新闻标题或新闻文章主体的计算机程序生成新闻文章。这种进步不仅会带来进步，也会带来一定的弊端。具体来说，对手正在使用这些技术来制作虚假新闻文章，称为“神经假新闻”。这种新闻模仿真实新闻的风格和外观，产生有针对性的宣传，用来迷惑人们。人类发现这种神经假新闻比人类写出来的假信息更可信[1]。这项研究的目的是根据其真实性将各种类型的神经生成新闻分类为真实或虚假。在现实世界场景中，人类通过依赖世界模型来评估新闻的真实性，即评估新闻中的内容是否与来自可靠新闻来源(例如美联社)的内容相同。在这项工作中，我们使用循环神经网络(RNN)，特别是Siamese双向LSTM (BiLSTM)，作为语义文本相似度(STS)模型，将真实新闻与神经新闻进行比较，以确定其是否为假。为了训练和测试模型，我们创建了3个数据集:一个包含从普通抓取中提取的真实新闻;第二个包括使用语言建模技术生成的神经假新闻数据集;第三部分包括使用文本数据增强技术生成的神经真实新闻数据集。研究发现，Siamese BiLSTM模型可以准确地找到真实新闻与神经新闻之间的相似度得分，从而将神经新闻分类为神经真实新闻或神经假新闻。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Swedish Workshop on Data Science (SweDS)

自引率

0.00%

发文量