{"title":"Classifying Fake and Real Neurally Generated News","authors":"Anitha Govindaraju, J. Griffith","doi":"10.1109/SweDS53855.2021.9638268","DOIUrl":null,"url":null,"abstract":"In this data era, with Natural Language Processing (NLP) techniques such as “Language Modelling” showing great progress, it is observed that the idea of “Automated Journalism” i.e., generating news articles using computer programs based on existing news headlines, or the body of a news article, is emerging. Such advancements not only lead to progress but also to certain disadvantages. Specifically, adversaries are using these techniques to create fake news articles called “Neural fake news”. Such news imitates the style and appearance of real news to generate targeted propaganda which is used to confuse people. Humans find this neural fake news to be more trustworthy than human- written disinformation [1]. The goal of this research is to classify various types of neurally generated news as real or fake based on its genuineness. In a real world scenario, humans evaluate the genuineness of news by relying on a model of the world, i.e., evaluating whether the content in the news is the same as the content from a reliable news source (e.g., Associated Press). In this work we use a Recurrent Neural Network (RNN), specifically a Siamese Bi-directional LSTM (BiLSTM), to act as a Semantic Textual Similarity (STS) model which compares the real news with neural news to determine whether it is fake or not. In order to train and test the model, 3 datasets have been created: One containing real news extracted from a common crawl; the second comprises a neural fake news dataset generated using language modelling techniques; the third comprises a neural real news dataset generated using textual data augmentation techniques. It is found that the Siamese BiLSTM model can accurately find the similarity scores between real news and neural news to allow the neural news to be classified as neural real or neural fake news.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Swedish Workshop on Data Science (SweDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SweDS53855.2021.9638268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this data era, with Natural Language Processing (NLP) techniques such as “Language Modelling” showing great progress, it is observed that the idea of “Automated Journalism” i.e., generating news articles using computer programs based on existing news headlines, or the body of a news article, is emerging. Such advancements not only lead to progress but also to certain disadvantages. Specifically, adversaries are using these techniques to create fake news articles called “Neural fake news”. Such news imitates the style and appearance of real news to generate targeted propaganda which is used to confuse people. Humans find this neural fake news to be more trustworthy than human- written disinformation [1]. The goal of this research is to classify various types of neurally generated news as real or fake based on its genuineness. In a real world scenario, humans evaluate the genuineness of news by relying on a model of the world, i.e., evaluating whether the content in the news is the same as the content from a reliable news source (e.g., Associated Press). In this work we use a Recurrent Neural Network (RNN), specifically a Siamese Bi-directional LSTM (BiLSTM), to act as a Semantic Textual Similarity (STS) model which compares the real news with neural news to determine whether it is fake or not. In order to train and test the model, 3 datasets have been created: One containing real news extracted from a common crawl; the second comprises a neural fake news dataset generated using language modelling techniques; the third comprises a neural real news dataset generated using textual data augmentation techniques. It is found that the Siamese BiLSTM model can accurately find the similarity scores between real news and neural news to allow the neural news to be classified as neural real or neural fake news.