R. Khan, A. S. M. Shihavuddin, M. M. Syeed, Rakib Ul Haque, Mohammad Faisal Uddin
{"title":"Improved Fake News Detection Method based on Deep Learning and Comparative Analysis with other Machine Learning approaches","authors":"R. Khan, A. S. M. Shihavuddin, M. M. Syeed, Rakib Ul Haque, Mohammad Faisal Uddin","doi":"10.1109/ICEET56468.2022.10007214","DOIUrl":null,"url":null,"abstract":"Recently, researchers have massively worked on fake news identification. Most of them focus on the classification method. These methods have accuracy problems and fail to perform well on diverse datasets due to the lack of a generalized feature extraction method. This study aims to enhance the score of the fake news identification model with a generalized and robust feature extraction method to handle the above problems. This study uses a popular fake news dataset, which is available in the Kaggle. The proposed approach uses Stemming that helps to convert all the words into their corresponding root word. Then TF-IDF and BERT convert all the texts into a feature vector for machine learning (Logistic Regression, Naive Bayes, Support Vector Machine, Passive Aggressive, K-means, K-medoids, and K-nearest neighbor) and deep learning (BERT), respectively. Performance analysis shows that BERT with the stemming Natural Language Processing (NLP) technique outperforms all the previous methods and achieves an accuracy of 99.74%. The previous state-of-the-art method (fakeBERT) has shown an accuracy of 98.90%. The primary reason for this performance gain is the stemming, which transforms all words in a sentence to their root word, resulting in a generalized vector that aids the model performance. On the other hand, the support vector machine (linear kernel) and passive-aggressive classifier method with stemming TF-IDF vectorizer also outperforms all the aforementioned approaches with the accuracy of 99.11% and 98.99%.","PeriodicalId":241355,"journal":{"name":"2022 International Conference on Engineering and Emerging Technologies (ICEET)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Engineering and Emerging Technologies (ICEET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEET56468.2022.10007214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, researchers have massively worked on fake news identification. Most of them focus on the classification method. These methods have accuracy problems and fail to perform well on diverse datasets due to the lack of a generalized feature extraction method. This study aims to enhance the score of the fake news identification model with a generalized and robust feature extraction method to handle the above problems. This study uses a popular fake news dataset, which is available in the Kaggle. The proposed approach uses Stemming that helps to convert all the words into their corresponding root word. Then TF-IDF and BERT convert all the texts into a feature vector for machine learning (Logistic Regression, Naive Bayes, Support Vector Machine, Passive Aggressive, K-means, K-medoids, and K-nearest neighbor) and deep learning (BERT), respectively. Performance analysis shows that BERT with the stemming Natural Language Processing (NLP) technique outperforms all the previous methods and achieves an accuracy of 99.74%. The previous state-of-the-art method (fakeBERT) has shown an accuracy of 98.90%. The primary reason for this performance gain is the stemming, which transforms all words in a sentence to their root word, resulting in a generalized vector that aids the model performance. On the other hand, the support vector machine (linear kernel) and passive-aggressive classifier method with stemming TF-IDF vectorizer also outperforms all the aforementioned approaches with the accuracy of 99.11% and 98.99%.