Pongsarun Boonyopakorn, N. Wisitpongphan, Ukid Changsan
{"title":"Classifying Cybercrime and Threat on Thai Online News: A Comparison of Supervised Learning Algorithms","authors":"Pongsarun Boonyopakorn, N. Wisitpongphan, Ukid Changsan","doi":"10.1109/ITC-CSCC58803.2023.10212562","DOIUrl":null,"url":null,"abstract":"Nowadays, there are many news and articles being offered on the Internet from various sources. This news can generally be classified into different domains, such as politics, entertainment, sports, technology, etc. Over the past decade, news related to cybercrime and threats were often classified into either crime or technology domains. However, contents related to such news should be alerted to all citizens as fast as possible in order to prevent further damage. Therefore, in this research paper, we aim to categorize news related to cybercrimes or threats from a pool of news of different topics. The challenge of this work is to be able to handle new types of threats such as call center scams, ransomware as well as other new threats which exploit the vulnerability of new technology/application. To tackle this problem, we have developed methods and processes to classify news into domains related to cybercrimes and threats using various Machine Learning techniques: Naive Bayes, KNN, SVM, Decision Trees, Random Forest, and Gradient Boosting. Using the dataset of 5,000 news articles, which consists of 3,500 general news and 1,500 cybercrime related news, the best technique that did the best in categorizing cybercrime related news is SVM, with an accuracy of 93.66% and a cross-validation accuracy of 92.34%. The second best technique is Random Forest with an accuracy of 92.46% and a cross-validation accuracy of 92.25%.","PeriodicalId":220939,"journal":{"name":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITC-CSCC58803.2023.10212562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nowadays, there are many news and articles being offered on the Internet from various sources. This news can generally be classified into different domains, such as politics, entertainment, sports, technology, etc. Over the past decade, news related to cybercrime and threats were often classified into either crime or technology domains. However, contents related to such news should be alerted to all citizens as fast as possible in order to prevent further damage. Therefore, in this research paper, we aim to categorize news related to cybercrimes or threats from a pool of news of different topics. The challenge of this work is to be able to handle new types of threats such as call center scams, ransomware as well as other new threats which exploit the vulnerability of new technology/application. To tackle this problem, we have developed methods and processes to classify news into domains related to cybercrimes and threats using various Machine Learning techniques: Naive Bayes, KNN, SVM, Decision Trees, Random Forest, and Gradient Boosting. Using the dataset of 5,000 news articles, which consists of 3,500 general news and 1,500 cybercrime related news, the best technique that did the best in categorizing cybercrime related news is SVM, with an accuracy of 93.66% and a cross-validation accuracy of 92.34%. The second best technique is Random Forest with an accuracy of 92.46% and a cross-validation accuracy of 92.25%.