COVID-19假新闻检测系统

R. Malhotra, Anushree Mahur, Achint
{"title":"COVID-19假新闻检测系统","authors":"R. Malhotra, Anushree Mahur, Achint","doi":"10.1109/confluence52989.2022.9734144","DOIUrl":null,"url":null,"abstract":"This article deals with the problem of the rapidly increasing COVID-19 infodemic in the world. Thus, there is a need for an effective framework of detecting fake information or misleading news related to COVID-19 virus/disease. To resolve this, we have used a dataset obtained from ConstraintAI'21. The dataset consists of 10,700 tweets and online posts of fake and real news concerning COVID-19. Machine Learning (ML) algorithms compared in this paper to classify the given news or tweet into real or fake are Logistic Regression (LR), K-Nearest Neighbor (KNN), Linear Support Vector Machine (LSVM), Random Forest Classifier (RFC), Decision Tree (DT), Naive Bayes (NB) and Stochastic Gradient Descent (SGD) algorithm. Two feature extraction techniques were used count vectorization and TF-IDF. Deep Learning (DL) algorithms implemented using Adam optimizer are Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). The best testing accuracy was achieved with the LSVM model using TF-IDF feature extraction method followed by Stochastic Gradient Descent classifier with TF-IDF feature extraction technique. LR, DT, and RFC performed better with the Count vectorization feature extraction technique, whereas LSVM, KNN, NB and SGD had better accuracy with TF-IDF feature extraction technique. The LSTM model performed slightly better among the DL algorithms.","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"COVID-19 Fake News Detection System\",\"authors\":\"R. Malhotra, Anushree Mahur, Achint\",\"doi\":\"10.1109/confluence52989.2022.9734144\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article deals with the problem of the rapidly increasing COVID-19 infodemic in the world. Thus, there is a need for an effective framework of detecting fake information or misleading news related to COVID-19 virus/disease. To resolve this, we have used a dataset obtained from ConstraintAI'21. The dataset consists of 10,700 tweets and online posts of fake and real news concerning COVID-19. Machine Learning (ML) algorithms compared in this paper to classify the given news or tweet into real or fake are Logistic Regression (LR), K-Nearest Neighbor (KNN), Linear Support Vector Machine (LSVM), Random Forest Classifier (RFC), Decision Tree (DT), Naive Bayes (NB) and Stochastic Gradient Descent (SGD) algorithm. Two feature extraction techniques were used count vectorization and TF-IDF. Deep Learning (DL) algorithms implemented using Adam optimizer are Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). The best testing accuracy was achieved with the LSVM model using TF-IDF feature extraction method followed by Stochastic Gradient Descent classifier with TF-IDF feature extraction technique. LR, DT, and RFC performed better with the Count vectorization feature extraction technique, whereas LSVM, KNN, NB and SGD had better accuracy with TF-IDF feature extraction technique. The LSTM model performed slightly better among the DL algorithms.\",\"PeriodicalId\":261941,\"journal\":{\"name\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/confluence52989.2022.9734144\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/confluence52989.2022.9734144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

本文论述了全球新冠肺炎疫情快速增长的问题。因此,需要一个有效的框架来检测与COVID-19病毒/疾病有关的虚假信息或误导性新闻。为了解决这个问题,我们使用了从ConstraintAI'21获得的数据集。该数据集包括有关COVID-19的10700条推文和假新闻和真实新闻。本文比较的机器学习(ML)算法将给定的新闻或推文分类为真实或虚假,包括逻辑回归(LR), k -最近邻(KNN),线性支持向量机(LSVM),随机森林分类器(RFC),决策树(DT),朴素贝叶斯(NB)和随机梯度下降(SGD)算法。采用计数矢量化和TF-IDF两种特征提取技术。使用Adam优化器实现的深度学习(DL)算法有循环神经网络(RNN)、长短期记忆(LSTM)和门控循环单元(GRU)。采用TF-IDF特征提取方法的LSVM模型测试精度最高,其次是采用TF-IDF特征提取技术的随机梯度下降分类器。LR、DT和RFC在Count向量化特征提取技术上表现较好,而LSVM、KNN、NB和SGD在TF-IDF特征提取技术上表现较好。LSTM模型在DL算法中表现稍好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
COVID-19 Fake News Detection System
This article deals with the problem of the rapidly increasing COVID-19 infodemic in the world. Thus, there is a need for an effective framework of detecting fake information or misleading news related to COVID-19 virus/disease. To resolve this, we have used a dataset obtained from ConstraintAI'21. The dataset consists of 10,700 tweets and online posts of fake and real news concerning COVID-19. Machine Learning (ML) algorithms compared in this paper to classify the given news or tweet into real or fake are Logistic Regression (LR), K-Nearest Neighbor (KNN), Linear Support Vector Machine (LSVM), Random Forest Classifier (RFC), Decision Tree (DT), Naive Bayes (NB) and Stochastic Gradient Descent (SGD) algorithm. Two feature extraction techniques were used count vectorization and TF-IDF. Deep Learning (DL) algorithms implemented using Adam optimizer are Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). The best testing accuracy was achieved with the LSVM model using TF-IDF feature extraction method followed by Stochastic Gradient Descent classifier with TF-IDF feature extraction technique. LR, DT, and RFC performed better with the Count vectorization feature extraction technique, whereas LSVM, KNN, NB and SGD had better accuracy with TF-IDF feature extraction technique. The LSTM model performed slightly better among the DL algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信