Developed Models Based on Transfer Learning for Improving Fake News Predictions

J. Univers. Comput. Sci. Pub Date : 2023-05-28 DOI:10.3897/jucs.94081

Tahseen A. Wotaifi, B. N. Dhannoon

{"title":"Developed Models Based on Transfer Learning for Improving Fake News Predictions","authors":"Tahseen A. Wotaifi, B. N. Dhannoon","doi":"10.3897/jucs.94081","DOIUrl":null,"url":null,"abstract":" In conjunction with the global concern regarding the spread of fake news on social media, there is a large flow of research to address this phenomenon. The wide growth in social media and online forums has made it easy for legitimate news to merge with comprehensive misleading news, negatively affecting people’s perceptions and misleading them. As such, this study aims to use deep learning, pre-trained models, and machine learning to predict Arabic and English fake news based on three public and available datasets: the Fake-or-Real dataset, the AraNews dataset, and the Sentimental LIAR dataset. Based on GloVe (Global Vectors) and FastText pre-trained models, A hybrid network has been proposed to improve the prediction of fake news. In this proposed network, CNN (Convolution Neural Network) was used to identify the most important features. In contrast, BiGRU (Bidirectional Gated Recurrent Unit) was used to measure the long-term dependency of sequences. Finally, multi-layer perceptron (MLP) is applied to classify the article news as fake or real. On the other hand, an Improved Random Forest Model is built based on the embedding values extracted from BERT (Bidirectional Encoder Representations from Transformers) pre-trained model and the relevant speaker-based features. These relevant features are identified by a fuzzy model based on feature selection methods. Accuracy was used as a measure of the quality of our proposed models, whereby the prediction accuracy reached 0.9935, 0.9473, and 0.7481 for the Fake-or-Real dataset, AraNews dataset, and Sentimental LAIR dataset respectively. The proposed models showed a significant improvement in the accuracy of predicting Arabic and English fake news compared to previous studies that used the same datasets. ","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"25 1","pages":"491-507"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Univers. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/jucs.94081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In conjunction with the global concern regarding the spread of fake news on social media, there is a large flow of research to address this phenomenon. The wide growth in social media and online forums has made it easy for legitimate news to merge with comprehensive misleading news, negatively affecting people’s perceptions and misleading them. As such, this study aims to use deep learning, pre-trained models, and machine learning to predict Arabic and English fake news based on three public and available datasets: the Fake-or-Real dataset, the AraNews dataset, and the Sentimental LIAR dataset. Based on GloVe (Global Vectors) and FastText pre-trained models, A hybrid network has been proposed to improve the prediction of fake news. In this proposed network, CNN (Convolution Neural Network) was used to identify the most important features. In contrast, BiGRU (Bidirectional Gated Recurrent Unit) was used to measure the long-term dependency of sequences. Finally, multi-layer perceptron (MLP) is applied to classify the article news as fake or real. On the other hand, an Improved Random Forest Model is built based on the embedding values extracted from BERT (Bidirectional Encoder Representations from Transformers) pre-trained model and the relevant speaker-based features. These relevant features are identified by a fuzzy model based on feature selection methods. Accuracy was used as a measure of the quality of our proposed models, whereby the prediction accuracy reached 0.9935, 0.9473, and 0.7481 for the Fake-or-Real dataset, AraNews dataset, and Sentimental LAIR dataset respectively. The proposed models showed a significant improvement in the accuracy of predicting Arabic and English fake news compared to previous studies that used the same datasets.

查看原文本刊更多论文

基于迁移学习的假新闻预测模型

随着全球对社交媒体上假新闻传播的关注，有大量的研究来解决这一现象。社交媒体和网络论坛的广泛发展，使得合法的新闻很容易与全面的误导性新闻融合在一起，对人们的认知产生负面影响，误导人们。因此，本研究旨在使用深度学习、预训练模型和机器学习来预测基于三个公开和可用数据集的阿拉伯语和英语假新闻:假或真数据集、AraNews数据集和感伤骗子数据集。基于GloVe (Global Vectors)和FastText预训练模型，提出了一种改进假新闻预测的混合网络。在该网络中，使用CNN(卷积神经网络)来识别最重要的特征。相比之下，BiGRU(双向门控循环单元)用于测量序列的长期依赖性。最后，应用多层感知器(MLP)对文章新闻进行真假分类。另一方面，基于BERT (Bidirectional Encoder Representations from Transformers)预训练模型提取的嵌入值和相关的基于说话人的特征，构建改进的随机森林模型。通过基于特征选择方法的模糊模型识别这些相关特征。准确度被用来衡量我们提出的模型的质量，其中，Fake-or-Real数据集、AraNews数据集和Sentimental LAIR数据集的预测准确度分别达到0.9935、0.9473和0.7481。与之前使用相同数据集的研究相比，所提出的模型在预测阿拉伯语和英语假新闻的准确性方面有显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Univers. Comput. Sci.

自引率

0.00%

发文量