Fighting Fake News Using Deep Learning: Pre-trained Word Embeddings and the Embedding Layer Investigated

Proceedings of the 2020 3rd International Conference on Computational Intelligence and Intelligent Systems Pub Date : 2020-11-13 DOI:10.1145/3440840.3440847

Fantahun Gereme, William Zhu

{"title":"Fighting Fake News Using Deep Learning: Pre-trained Word Embeddings and the Embedding Layer Investigated","authors":"Fantahun Gereme, William Zhu","doi":"10.1145/3440840.3440847","DOIUrl":null,"url":null,"abstract":"Fake news is progressively becoming a threat to individuals, society, news systems, governments and democracy. The need to fight it is rising accompanied by various researches that showed promising results. Deep learning methods and word embeddings contributed a lot in devising detection mechanisms. However, lack of sufficient datasets and the question “which word embedding best captures content features” have posed challenges to make detection methods adequately accurate. In this work, we prepared a dataset from a scrape of 13 years of continuous data that we believe will narrow the gap. We also proposed a deep learning model for early detection of fake news using convolutional neural networks and long short-term memory networks. We evaluated three pre-trained word embeddings in the context of the fake news problem using different measures. Series of experiments were made on three real world datasets, including ours, using the proposed model. Results showed that the choice of pre-trained embeddings can be arbitrary. However, embeddings purely trained from the fake news dataset and pre-trained embeddings allowed to update showed relatively better performance over static embeddings. High dimensional embeddings showed better results than low dimensional embeddings and this persisted for all the datasets used.","PeriodicalId":273859,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Computational Intelligence and Intelligent Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Computational Intelligence and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440840.3440847","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Fake news is progressively becoming a threat to individuals, society, news systems, governments and democracy. The need to fight it is rising accompanied by various researches that showed promising results. Deep learning methods and word embeddings contributed a lot in devising detection mechanisms. However, lack of sufficient datasets and the question “which word embedding best captures content features” have posed challenges to make detection methods adequately accurate. In this work, we prepared a dataset from a scrape of 13 years of continuous data that we believe will narrow the gap. We also proposed a deep learning model for early detection of fake news using convolutional neural networks and long short-term memory networks. We evaluated three pre-trained word embeddings in the context of the fake news problem using different measures. Series of experiments were made on three real world datasets, including ours, using the proposed model. Results showed that the choice of pre-trained embeddings can be arbitrary. However, embeddings purely trained from the fake news dataset and pre-trained embeddings allowed to update showed relatively better performance over static embeddings. High dimensional embeddings showed better results than low dimensional embeddings and this persisted for all the datasets used.

查看原文本刊更多论文

使用深度学习打击假新闻:预训练词嵌入和嵌入层的研究

假新闻正逐渐成为对个人、社会、新闻系统、政府和民主的威胁。随着各种研究显示出有希望的结果，抗击艾滋病的必要性正在上升。深度学习方法和词嵌入在设计检测机制方面做出了很大贡献。然而，缺乏足够的数据集和“哪个词嵌入最能捕获内容特征”的问题给检测方法的准确性带来了挑战。在这项工作中，我们从13年的连续数据中收集了一个数据集，我们相信这将缩小差距。我们还提出了一个深度学习模型，用于使用卷积神经网络和长短期记忆网络来早期检测假新闻。我们在假新闻问题的背景下使用不同的测量方法评估了三种预训练的词嵌入。在三个真实世界的数据集上进行了一系列实验，包括我们的数据集，使用所提出的模型。结果表明，预训练嵌入的选择是任意的。然而，纯粹从假新闻数据集训练的嵌入和允许更新的预训练嵌入比静态嵌入表现出相对更好的性能。高维嵌入比低维嵌入显示出更好的结果，并且对于所有使用的数据集都是如此。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 3rd International Conference on Computational Intelligence and Intelligent Systems

自引率

0.00%

发文量