{"title":"利用分类器方法检测网络抓取信息中的假新闻和骗局","authors":"F. W. Wibowo, Akhmad Dahlan, Wihayati","doi":"10.1109/ISRITI54043.2021.9702824","DOIUrl":null,"url":null,"abstract":"Current technological developments can make humans get information from the hand through gadget media. However, bad and good impacts are indeed a problem that arises in implementing this technology media. Fake news and hoaxes have developed along with social media applications obtained from these technological media. This paper aims to detect fake news and hoaxes using classification modeling. The classification models implemented in this paper are support vector machine (SVM), random forest, nearest centroid, stochastic gradient descent (SGD) method, decision tree (Tree), bagging, AdaBoost, gradient boosting, multi-layer perceptron artificial neural network (MLP ANN), and K-nearest neighbors (K-NN). The data obtained through web scraping amounted to 1116 data from Indonesian language news, with the distribution of training data and test data for modeling of 70% and 30%. The testing data are 335 data consisting of 205 fake news and hoax data and 130 real news data. Web data content processing using the principle of natural language processing (NLP) methods. The random forest model is the best model for classifying fake news and hoaxes with an accuracy value of 89%. The following models with the next high scores are SVM, Gradient Boosting, AdaBoost, SGD, and Decision Tree, respectively, with the highest scores above 80%. In comparison, other methods have accuracy scores below 80%.","PeriodicalId":156265,"journal":{"name":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Detection of Fake News and Hoaxes on Information from Web Scraping using Classifier Methods\",\"authors\":\"F. W. Wibowo, Akhmad Dahlan, Wihayati\",\"doi\":\"10.1109/ISRITI54043.2021.9702824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current technological developments can make humans get information from the hand through gadget media. However, bad and good impacts are indeed a problem that arises in implementing this technology media. Fake news and hoaxes have developed along with social media applications obtained from these technological media. This paper aims to detect fake news and hoaxes using classification modeling. The classification models implemented in this paper are support vector machine (SVM), random forest, nearest centroid, stochastic gradient descent (SGD) method, decision tree (Tree), bagging, AdaBoost, gradient boosting, multi-layer perceptron artificial neural network (MLP ANN), and K-nearest neighbors (K-NN). The data obtained through web scraping amounted to 1116 data from Indonesian language news, with the distribution of training data and test data for modeling of 70% and 30%. The testing data are 335 data consisting of 205 fake news and hoax data and 130 real news data. Web data content processing using the principle of natural language processing (NLP) methods. The random forest model is the best model for classifying fake news and hoaxes with an accuracy value of 89%. The following models with the next high scores are SVM, Gradient Boosting, AdaBoost, SGD, and Decision Tree, respectively, with the highest scores above 80%. In comparison, other methods have accuracy scores below 80%.\",\"PeriodicalId\":156265,\"journal\":{\"name\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISRITI54043.2021.9702824\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI54043.2021.9702824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detection of Fake News and Hoaxes on Information from Web Scraping using Classifier Methods
Current technological developments can make humans get information from the hand through gadget media. However, bad and good impacts are indeed a problem that arises in implementing this technology media. Fake news and hoaxes have developed along with social media applications obtained from these technological media. This paper aims to detect fake news and hoaxes using classification modeling. The classification models implemented in this paper are support vector machine (SVM), random forest, nearest centroid, stochastic gradient descent (SGD) method, decision tree (Tree), bagging, AdaBoost, gradient boosting, multi-layer perceptron artificial neural network (MLP ANN), and K-nearest neighbors (K-NN). The data obtained through web scraping amounted to 1116 data from Indonesian language news, with the distribution of training data and test data for modeling of 70% and 30%. The testing data are 335 data consisting of 205 fake news and hoax data and 130 real news data. Web data content processing using the principle of natural language processing (NLP) methods. The random forest model is the best model for classifying fake news and hoaxes with an accuracy value of 89%. The following models with the next high scores are SVM, Gradient Boosting, AdaBoost, SGD, and Decision Tree, respectively, with the highest scores above 80%. In comparison, other methods have accuracy scores below 80%.