{"title":"COVID-19假新闻检测系统","authors":"R. Malhotra, Anushree Mahur, Achint","doi":"10.1109/confluence52989.2022.9734144","DOIUrl":null,"url":null,"abstract":"This article deals with the problem of the rapidly increasing COVID-19 infodemic in the world. Thus, there is a need for an effective framework of detecting fake information or misleading news related to COVID-19 virus/disease. To resolve this, we have used a dataset obtained from ConstraintAI'21. The dataset consists of 10,700 tweets and online posts of fake and real news concerning COVID-19. Machine Learning (ML) algorithms compared in this paper to classify the given news or tweet into real or fake are Logistic Regression (LR), K-Nearest Neighbor (KNN), Linear Support Vector Machine (LSVM), Random Forest Classifier (RFC), Decision Tree (DT), Naive Bayes (NB) and Stochastic Gradient Descent (SGD) algorithm. Two feature extraction techniques were used count vectorization and TF-IDF. Deep Learning (DL) algorithms implemented using Adam optimizer are Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). The best testing accuracy was achieved with the LSVM model using TF-IDF feature extraction method followed by Stochastic Gradient Descent classifier with TF-IDF feature extraction technique. LR, DT, and RFC performed better with the Count vectorization feature extraction technique, whereas LSVM, KNN, NB and SGD had better accuracy with TF-IDF feature extraction technique. The LSTM model performed slightly better among the DL algorithms.","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"COVID-19 Fake News Detection System\",\"authors\":\"R. Malhotra, Anushree Mahur, Achint\",\"doi\":\"10.1109/confluence52989.2022.9734144\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article deals with the problem of the rapidly increasing COVID-19 infodemic in the world. Thus, there is a need for an effective framework of detecting fake information or misleading news related to COVID-19 virus/disease. To resolve this, we have used a dataset obtained from ConstraintAI'21. The dataset consists of 10,700 tweets and online posts of fake and real news concerning COVID-19. Machine Learning (ML) algorithms compared in this paper to classify the given news or tweet into real or fake are Logistic Regression (LR), K-Nearest Neighbor (KNN), Linear Support Vector Machine (LSVM), Random Forest Classifier (RFC), Decision Tree (DT), Naive Bayes (NB) and Stochastic Gradient Descent (SGD) algorithm. Two feature extraction techniques were used count vectorization and TF-IDF. Deep Learning (DL) algorithms implemented using Adam optimizer are Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). The best testing accuracy was achieved with the LSVM model using TF-IDF feature extraction method followed by Stochastic Gradient Descent classifier with TF-IDF feature extraction technique. LR, DT, and RFC performed better with the Count vectorization feature extraction technique, whereas LSVM, KNN, NB and SGD had better accuracy with TF-IDF feature extraction technique. The LSTM model performed slightly better among the DL algorithms.\",\"PeriodicalId\":261941,\"journal\":{\"name\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/confluence52989.2022.9734144\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/confluence52989.2022.9734144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
本文论述了全球新冠肺炎疫情快速增长的问题。因此,需要一个有效的框架来检测与COVID-19病毒/疾病有关的虚假信息或误导性新闻。为了解决这个问题,我们使用了从ConstraintAI'21获得的数据集。该数据集包括有关COVID-19的10700条推文和假新闻和真实新闻。本文比较的机器学习(ML)算法将给定的新闻或推文分类为真实或虚假,包括逻辑回归(LR), k -最近邻(KNN),线性支持向量机(LSVM),随机森林分类器(RFC),决策树(DT),朴素贝叶斯(NB)和随机梯度下降(SGD)算法。采用计数矢量化和TF-IDF两种特征提取技术。使用Adam优化器实现的深度学习(DL)算法有循环神经网络(RNN)、长短期记忆(LSTM)和门控循环单元(GRU)。采用TF-IDF特征提取方法的LSVM模型测试精度最高,其次是采用TF-IDF特征提取技术的随机梯度下降分类器。LR、DT和RFC在Count向量化特征提取技术上表现较好,而LSVM、KNN、NB和SGD在TF-IDF特征提取技术上表现较好。LSTM模型在DL算法中表现稍好。
This article deals with the problem of the rapidly increasing COVID-19 infodemic in the world. Thus, there is a need for an effective framework of detecting fake information or misleading news related to COVID-19 virus/disease. To resolve this, we have used a dataset obtained from ConstraintAI'21. The dataset consists of 10,700 tweets and online posts of fake and real news concerning COVID-19. Machine Learning (ML) algorithms compared in this paper to classify the given news or tweet into real or fake are Logistic Regression (LR), K-Nearest Neighbor (KNN), Linear Support Vector Machine (LSVM), Random Forest Classifier (RFC), Decision Tree (DT), Naive Bayes (NB) and Stochastic Gradient Descent (SGD) algorithm. Two feature extraction techniques were used count vectorization and TF-IDF. Deep Learning (DL) algorithms implemented using Adam optimizer are Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). The best testing accuracy was achieved with the LSVM model using TF-IDF feature extraction method followed by Stochastic Gradient Descent classifier with TF-IDF feature extraction technique. LR, DT, and RFC performed better with the Count vectorization feature extraction technique, whereas LSVM, KNN, NB and SGD had better accuracy with TF-IDF feature extraction technique. The LSTM model performed slightly better among the DL algorithms.