利用机器学习技术检测假新闻

2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA) Pub Date : 2023-05-23 DOI:10.1109/SERA57763.2023.10197712

Achhiya Sultana, Mahmudul Islam, Mahady Hasan, F. Ahmed

{"title":"利用机器学习技术检测假新闻","authors":"Achhiya Sultana, Mahmudul Islam, Mahady Hasan, F. Ahmed","doi":"10.1109/SERA57763.2023.10197712","DOIUrl":null,"url":null,"abstract":"A lot of information is spread by people in the social media to update their status and share crucial news with others. But the majority of these platforms don’t promptly validate the individuals or their posts and people aren’t able to identify the fake news manually. Therefore, there is a need for an automated system capable of detecting fake news. This research has proposed to build a model using four machine learning algorithms. The dataset employed in the experiment is a composite of two datasets containing almost equal amounts of true and fake news articles on politics. The preprocessing stages begin with cleaning the data by removing punctuation, tokenization, special characters, white spaces, redundant word elimination, numerals, and English letters followed by stemming and stop with data discretization. Then, we analyzed the collected data and 80% of the data has been used to train each model initially. After that, the four manifested classification algorithms are applied. For identifying fake news from news articles, meth-ods like Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting Classifier were used. The trained classifiers’ accuracy has been evaluated using the remaining 20% of the data. The results show that the decision tree model produces the best accuracy of 99.60% and gradient boosting of 99.55%. Besides, the random forest shows 99.10% along with the logistic regression 98.99%. Moreover, we have explored the best model to achieve the highest precision, recall, F1-score based on the confusion matrix’s outcome.","PeriodicalId":211080,"journal":{"name":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fake News Detection Using Machine Learning Techniques\",\"authors\":\"Achhiya Sultana, Mahmudul Islam, Mahady Hasan, F. Ahmed\",\"doi\":\"10.1109/SERA57763.2023.10197712\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A lot of information is spread by people in the social media to update their status and share crucial news with others. But the majority of these platforms don’t promptly validate the individuals or their posts and people aren’t able to identify the fake news manually. Therefore, there is a need for an automated system capable of detecting fake news. This research has proposed to build a model using four machine learning algorithms. The dataset employed in the experiment is a composite of two datasets containing almost equal amounts of true and fake news articles on politics. The preprocessing stages begin with cleaning the data by removing punctuation, tokenization, special characters, white spaces, redundant word elimination, numerals, and English letters followed by stemming and stop with data discretization. Then, we analyzed the collected data and 80% of the data has been used to train each model initially. After that, the four manifested classification algorithms are applied. For identifying fake news from news articles, meth-ods like Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting Classifier were used. The trained classifiers’ accuracy has been evaluated using the remaining 20% of the data. The results show that the decision tree model produces the best accuracy of 99.60% and gradient boosting of 99.55%. Besides, the random forest shows 99.10% along with the logistic regression 98.99%. Moreover, we have explored the best model to achieve the highest precision, recall, F1-score based on the confusion matrix’s outcome.\",\"PeriodicalId\":211080,\"journal\":{\"name\":\"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SERA57763.2023.10197712\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA57763.2023.10197712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人们在社交媒体上传播大量信息，以更新自己的状态，并与他人分享重要新闻。但这些平台中的大多数都没有及时验证个人或他们的帖子，人们也无法手动识别假新闻。因此，需要一种能够检测假新闻的自动化系统。本研究提出使用四种机器学习算法建立一个模型。实验中使用的数据集是两个数据集的组合，其中包含几乎相同数量的真假政治新闻文章。预处理阶段首先通过删除标点符号、标记化、特殊字符、空白、冗余单词消除、数字和英文字母来清理数据，然后进行词干提取，最后以数据离散化结束。然后，我们对收集到的数据进行分析，80%的数据被用于初始训练每个模型。然后，应用了四种分类算法。为了从新闻文章中识别假新闻，使用了逻辑回归、决策树、随机森林和梯度增强分类器等方法。使用剩下的20%的数据对训练好的分类器的准确性进行了评估。结果表明，决策树模型的准确率为99.60%，梯度提升率为99.55%。随机森林为99.10%，逻辑回归为98.99%。此外，我们还探索了基于混淆矩阵结果的最佳模型，以实现最高的精度，召回率，f1分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fake News Detection Using Machine Learning Techniques

A lot of information is spread by people in the social media to update their status and share crucial news with others. But the majority of these platforms don’t promptly validate the individuals or their posts and people aren’t able to identify the fake news manually. Therefore, there is a need for an automated system capable of detecting fake news. This research has proposed to build a model using four machine learning algorithms. The dataset employed in the experiment is a composite of two datasets containing almost equal amounts of true and fake news articles on politics. The preprocessing stages begin with cleaning the data by removing punctuation, tokenization, special characters, white spaces, redundant word elimination, numerals, and English letters followed by stemming and stop with data discretization. Then, we analyzed the collected data and 80% of the data has been used to train each model initially. After that, the four manifested classification algorithms are applied. For identifying fake news from news articles, meth-ods like Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting Classifier were used. The trained classifiers’ accuracy has been evaluated using the remaining 20% of the data. The results show that the decision tree model produces the best accuracy of 99.60% and gradient boosting of 99.55%. Besides, the random forest shows 99.10% along with the logistic regression 98.99%. Moreover, we have explored the best model to achieve the highest precision, recall, F1-score based on the confusion matrix’s outcome.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)

自引率

0.00%

发文量