利用分类器方法检测网络抓取信息中的假新闻和骗局

F. W. Wibowo, Akhmad Dahlan, Wihayati
{"title":"利用分类器方法检测网络抓取信息中的假新闻和骗局","authors":"F. W. Wibowo, Akhmad Dahlan, Wihayati","doi":"10.1109/ISRITI54043.2021.9702824","DOIUrl":null,"url":null,"abstract":"Current technological developments can make humans get information from the hand through gadget media. However, bad and good impacts are indeed a problem that arises in implementing this technology media. Fake news and hoaxes have developed along with social media applications obtained from these technological media. This paper aims to detect fake news and hoaxes using classification modeling. The classification models implemented in this paper are support vector machine (SVM), random forest, nearest centroid, stochastic gradient descent (SGD) method, decision tree (Tree), bagging, AdaBoost, gradient boosting, multi-layer perceptron artificial neural network (MLP ANN), and K-nearest neighbors (K-NN). The data obtained through web scraping amounted to 1116 data from Indonesian language news, with the distribution of training data and test data for modeling of 70% and 30%. The testing data are 335 data consisting of 205 fake news and hoax data and 130 real news data. Web data content processing using the principle of natural language processing (NLP) methods. The random forest model is the best model for classifying fake news and hoaxes with an accuracy value of 89%. The following models with the next high scores are SVM, Gradient Boosting, AdaBoost, SGD, and Decision Tree, respectively, with the highest scores above 80%. In comparison, other methods have accuracy scores below 80%.","PeriodicalId":156265,"journal":{"name":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Detection of Fake News and Hoaxes on Information from Web Scraping using Classifier Methods\",\"authors\":\"F. W. Wibowo, Akhmad Dahlan, Wihayati\",\"doi\":\"10.1109/ISRITI54043.2021.9702824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current technological developments can make humans get information from the hand through gadget media. However, bad and good impacts are indeed a problem that arises in implementing this technology media. Fake news and hoaxes have developed along with social media applications obtained from these technological media. This paper aims to detect fake news and hoaxes using classification modeling. The classification models implemented in this paper are support vector machine (SVM), random forest, nearest centroid, stochastic gradient descent (SGD) method, decision tree (Tree), bagging, AdaBoost, gradient boosting, multi-layer perceptron artificial neural network (MLP ANN), and K-nearest neighbors (K-NN). The data obtained through web scraping amounted to 1116 data from Indonesian language news, with the distribution of training data and test data for modeling of 70% and 30%. The testing data are 335 data consisting of 205 fake news and hoax data and 130 real news data. Web data content processing using the principle of natural language processing (NLP) methods. The random forest model is the best model for classifying fake news and hoaxes with an accuracy value of 89%. The following models with the next high scores are SVM, Gradient Boosting, AdaBoost, SGD, and Decision Tree, respectively, with the highest scores above 80%. In comparison, other methods have accuracy scores below 80%.\",\"PeriodicalId\":156265,\"journal\":{\"name\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISRITI54043.2021.9702824\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI54043.2021.9702824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

目前的科技发展可以让人类通过小工具媒介从手中获取信息。然而,好坏影响确实是在实施这一技术媒体过程中出现的一个问题。随着从这些技术媒体获得的社交媒体应用程序的发展,假新闻和骗局也随之发展。本文旨在使用分类建模来检测假新闻和骗局。本文实现的分类模型有支持向量机(SVM)、随机森林、最近质心、随机梯度下降(SGD)方法、决策树(tree)、bagging、AdaBoost、梯度增强、多层感知器人工神经网络(MLP ANN)和k -近邻(K-NN)。通过网络抓取获得的数据总计1116个印尼语新闻数据,其中训练数据和建模测试数据的分布分别占70%和30%。测试数据为335个数据,其中假新闻和骗局数据205个,真实新闻数据130个。Web数据内容处理采用自然语言处理(NLP)的原理方法。随机森林模型是对假新闻和骗局进行分类的最佳模型,准确率为89%。接下来得分较高的模型分别是SVM、Gradient Boosting、AdaBoost、SGD和Decision Tree,最高得分都在80%以上。相比之下,其他方法的准确率得分在80%以下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detection of Fake News and Hoaxes on Information from Web Scraping using Classifier Methods
Current technological developments can make humans get information from the hand through gadget media. However, bad and good impacts are indeed a problem that arises in implementing this technology media. Fake news and hoaxes have developed along with social media applications obtained from these technological media. This paper aims to detect fake news and hoaxes using classification modeling. The classification models implemented in this paper are support vector machine (SVM), random forest, nearest centroid, stochastic gradient descent (SGD) method, decision tree (Tree), bagging, AdaBoost, gradient boosting, multi-layer perceptron artificial neural network (MLP ANN), and K-nearest neighbors (K-NN). The data obtained through web scraping amounted to 1116 data from Indonesian language news, with the distribution of training data and test data for modeling of 70% and 30%. The testing data are 335 data consisting of 205 fake news and hoax data and 130 real news data. Web data content processing using the principle of natural language processing (NLP) methods. The random forest model is the best model for classifying fake news and hoaxes with an accuracy value of 89%. The following models with the next high scores are SVM, Gradient Boosting, AdaBoost, SGD, and Decision Tree, respectively, with the highest scores above 80%. In comparison, other methods have accuracy scores below 80%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信