R. Khan, A. S. M. Shihavuddin, M. M. Syeed, Rakib Ul Haque, Mohammad Faisal Uddin
{"title":"基于深度学习和与其他机器学习方法比较分析的改进假新闻检测方法","authors":"R. Khan, A. S. M. Shihavuddin, M. M. Syeed, Rakib Ul Haque, Mohammad Faisal Uddin","doi":"10.1109/ICEET56468.2022.10007214","DOIUrl":null,"url":null,"abstract":"Recently, researchers have massively worked on fake news identification. Most of them focus on the classification method. These methods have accuracy problems and fail to perform well on diverse datasets due to the lack of a generalized feature extraction method. This study aims to enhance the score of the fake news identification model with a generalized and robust feature extraction method to handle the above problems. This study uses a popular fake news dataset, which is available in the Kaggle. The proposed approach uses Stemming that helps to convert all the words into their corresponding root word. Then TF-IDF and BERT convert all the texts into a feature vector for machine learning (Logistic Regression, Naive Bayes, Support Vector Machine, Passive Aggressive, K-means, K-medoids, and K-nearest neighbor) and deep learning (BERT), respectively. Performance analysis shows that BERT with the stemming Natural Language Processing (NLP) technique outperforms all the previous methods and achieves an accuracy of 99.74%. The previous state-of-the-art method (fakeBERT) has shown an accuracy of 98.90%. The primary reason for this performance gain is the stemming, which transforms all words in a sentence to their root word, resulting in a generalized vector that aids the model performance. On the other hand, the support vector machine (linear kernel) and passive-aggressive classifier method with stemming TF-IDF vectorizer also outperforms all the aforementioned approaches with the accuracy of 99.11% and 98.99%.","PeriodicalId":241355,"journal":{"name":"2022 International Conference on Engineering and Emerging Technologies (ICEET)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved Fake News Detection Method based on Deep Learning and Comparative Analysis with other Machine Learning approaches\",\"authors\":\"R. Khan, A. S. M. Shihavuddin, M. M. Syeed, Rakib Ul Haque, Mohammad Faisal Uddin\",\"doi\":\"10.1109/ICEET56468.2022.10007214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, researchers have massively worked on fake news identification. Most of them focus on the classification method. These methods have accuracy problems and fail to perform well on diverse datasets due to the lack of a generalized feature extraction method. This study aims to enhance the score of the fake news identification model with a generalized and robust feature extraction method to handle the above problems. This study uses a popular fake news dataset, which is available in the Kaggle. The proposed approach uses Stemming that helps to convert all the words into their corresponding root word. Then TF-IDF and BERT convert all the texts into a feature vector for machine learning (Logistic Regression, Naive Bayes, Support Vector Machine, Passive Aggressive, K-means, K-medoids, and K-nearest neighbor) and deep learning (BERT), respectively. Performance analysis shows that BERT with the stemming Natural Language Processing (NLP) technique outperforms all the previous methods and achieves an accuracy of 99.74%. The previous state-of-the-art method (fakeBERT) has shown an accuracy of 98.90%. The primary reason for this performance gain is the stemming, which transforms all words in a sentence to their root word, resulting in a generalized vector that aids the model performance. On the other hand, the support vector machine (linear kernel) and passive-aggressive classifier method with stemming TF-IDF vectorizer also outperforms all the aforementioned approaches with the accuracy of 99.11% and 98.99%.\",\"PeriodicalId\":241355,\"journal\":{\"name\":\"2022 International Conference on Engineering and Emerging Technologies (ICEET)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Engineering and Emerging Technologies (ICEET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEET56468.2022.10007214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Engineering and Emerging Technologies (ICEET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEET56468.2022.10007214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved Fake News Detection Method based on Deep Learning and Comparative Analysis with other Machine Learning approaches
Recently, researchers have massively worked on fake news identification. Most of them focus on the classification method. These methods have accuracy problems and fail to perform well on diverse datasets due to the lack of a generalized feature extraction method. This study aims to enhance the score of the fake news identification model with a generalized and robust feature extraction method to handle the above problems. This study uses a popular fake news dataset, which is available in the Kaggle. The proposed approach uses Stemming that helps to convert all the words into their corresponding root word. Then TF-IDF and BERT convert all the texts into a feature vector for machine learning (Logistic Regression, Naive Bayes, Support Vector Machine, Passive Aggressive, K-means, K-medoids, and K-nearest neighbor) and deep learning (BERT), respectively. Performance analysis shows that BERT with the stemming Natural Language Processing (NLP) technique outperforms all the previous methods and achieves an accuracy of 99.74%. The previous state-of-the-art method (fakeBERT) has shown an accuracy of 98.90%. The primary reason for this performance gain is the stemming, which transforms all words in a sentence to their root word, resulting in a generalized vector that aids the model performance. On the other hand, the support vector machine (linear kernel) and passive-aggressive classifier method with stemming TF-IDF vectorizer also outperforms all the aforementioned approaches with the accuracy of 99.11% and 98.99%.