Classifying Cybercrime and Threat on Thai Online News: A Comparison of Supervised Learning Algorithms

Pongsarun Boonyopakorn, N. Wisitpongphan, Ukid Changsan
{"title":"Classifying Cybercrime and Threat on Thai Online News: A Comparison of Supervised Learning Algorithms","authors":"Pongsarun Boonyopakorn, N. Wisitpongphan, Ukid Changsan","doi":"10.1109/ITC-CSCC58803.2023.10212562","DOIUrl":null,"url":null,"abstract":"Nowadays, there are many news and articles being offered on the Internet from various sources. This news can generally be classified into different domains, such as politics, entertainment, sports, technology, etc. Over the past decade, news related to cybercrime and threats were often classified into either crime or technology domains. However, contents related to such news should be alerted to all citizens as fast as possible in order to prevent further damage. Therefore, in this research paper, we aim to categorize news related to cybercrimes or threats from a pool of news of different topics. The challenge of this work is to be able to handle new types of threats such as call center scams, ransomware as well as other new threats which exploit the vulnerability of new technology/application. To tackle this problem, we have developed methods and processes to classify news into domains related to cybercrimes and threats using various Machine Learning techniques: Naive Bayes, KNN, SVM, Decision Trees, Random Forest, and Gradient Boosting. Using the dataset of 5,000 news articles, which consists of 3,500 general news and 1,500 cybercrime related news, the best technique that did the best in categorizing cybercrime related news is SVM, with an accuracy of 93.66% and a cross-validation accuracy of 92.34%. The second best technique is Random Forest with an accuracy of 92.46% and a cross-validation accuracy of 92.25%.","PeriodicalId":220939,"journal":{"name":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITC-CSCC58803.2023.10212562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Nowadays, there are many news and articles being offered on the Internet from various sources. This news can generally be classified into different domains, such as politics, entertainment, sports, technology, etc. Over the past decade, news related to cybercrime and threats were often classified into either crime or technology domains. However, contents related to such news should be alerted to all citizens as fast as possible in order to prevent further damage. Therefore, in this research paper, we aim to categorize news related to cybercrimes or threats from a pool of news of different topics. The challenge of this work is to be able to handle new types of threats such as call center scams, ransomware as well as other new threats which exploit the vulnerability of new technology/application. To tackle this problem, we have developed methods and processes to classify news into domains related to cybercrimes and threats using various Machine Learning techniques: Naive Bayes, KNN, SVM, Decision Trees, Random Forest, and Gradient Boosting. Using the dataset of 5,000 news articles, which consists of 3,500 general news and 1,500 cybercrime related news, the best technique that did the best in categorizing cybercrime related news is SVM, with an accuracy of 93.66% and a cross-validation accuracy of 92.34%. The second best technique is Random Forest with an accuracy of 92.46% and a cross-validation accuracy of 92.25%.
分类网络犯罪和威胁在泰国在线新闻:监督学习算法的比较
如今,有许多新闻和文章被提供在互联网上从各种来源。这些新闻通常可以分为不同的领域,如政治、娱乐、体育、科技等。在过去的十年中,与网络犯罪和威胁相关的新闻通常被分为犯罪或技术领域。然而,为了防止进一步的损害,应该尽快向所有公民通报与此类新闻有关的内容。因此,在本研究论文中,我们的目标是从不同主题的新闻池中对与网络犯罪或威胁相关的新闻进行分类。这项工作的挑战是能够处理新类型的威胁,如呼叫中心诈骗,勒索软件以及其他利用新技术/应用程序漏洞的新威胁。为了解决这个问题,我们开发了方法和流程,使用各种机器学习技术将新闻分类到与网络犯罪和威胁相关的领域:朴素贝叶斯、KNN、支持向量机、决策树、随机森林和梯度增强。使用5000篇新闻文章的数据集,其中包括3500篇一般新闻和1500篇网络犯罪相关新闻,在网络犯罪相关新闻分类方面表现最好的技术是SVM,准确率为93.66%,交叉验证准确率为92.34%。第二好的技术是Random Forest,准确率为92.46%,交叉验证准确率为92.25%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信