Machine learning model for predicting cyber-criminal characteristics

IF 1.1 4区综合性期刊 Q3 MULTIDISCIPLINARY SCIENCES

Kuwait Journal of Science Pub Date : 2025-08-14 DOI:10.1016/j.kjs.2025.100487

Hesham A. Almansouri, Mohammad M. Khajah, Nasser B. Alsnayen

{"title":"Machine learning model for predicting cyber-criminal characteristics","authors":"Hesham A. Almansouri, Mohammad M. Khajah, Nasser B. Alsnayen","doi":"10.1016/j.kjs.2025.100487","DOIUrl":null,"url":null,"abstract":"<div><div>This study aims to predict investigation outcomes of individual cybercrime cases using the most relevant information provided by complainants. We curated a dataset on solved hacking cases from 2019 to 2022 from the cyber-crime combating department (3CD) in the State of Kuwait. Each case has a set of information provided by the complainants (input features), and a corresponding set of investigation results (outputs). For each output, several machine learning models, such as decision trees and feed-forward neural networks, were evaluated, via nested 5-fold cross-validation to measure how well they could predict the output, given the input features. Input feature sets were either selected from all possible input feature combinations (brute force) or from a limited set of officer-provided combinations (officer-guided). Finally, a post-hoc analysis of the results was performed to identify a single set of features that can be used to build reasonably predictive models for all collected outputs. Depending on the output, the brute force and officer-guided approaches have a median relative advantage of 92% and 53% over the baseline models and worst-officer score respectively. On almost all outputs, the brute-force approach is just as good, if not better, than the officer-guided approach. No relationship was observed between officer rank and the predictive power of the combination of features they selected. Different outputs require different sets of features, and there is a significant overlap between brute force and officer-guided features in five out of the 10 outputs. Most selected features have a reliable negative impact on prediction performance when perturbed, with some outputs relying on a few critical features and others on a spectrum of features. Finally, a single set of features can predict most outputs almost as well as output-specific features.</div></div>","PeriodicalId":17848,"journal":{"name":"Kuwait Journal of Science","volume":"53 1","pages":"Article 100487"},"PeriodicalIF":1.1000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Kuwait Journal of Science","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2307410825001312","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

This study aims to predict investigation outcomes of individual cybercrime cases using the most relevant information provided by complainants. We curated a dataset on solved hacking cases from 2019 to 2022 from the cyber-crime combating department (3CD) in the State of Kuwait. Each case has a set of information provided by the complainants (input features), and a corresponding set of investigation results (outputs). For each output, several machine learning models, such as decision trees and feed-forward neural networks, were evaluated, via nested 5-fold cross-validation to measure how well they could predict the output, given the input features. Input feature sets were either selected from all possible input feature combinations (brute force) or from a limited set of officer-provided combinations (officer-guided). Finally, a post-hoc analysis of the results was performed to identify a single set of features that can be used to build reasonably predictive models for all collected outputs. Depending on the output, the brute force and officer-guided approaches have a median relative advantage of 92% and 53% over the baseline models and worst-officer score respectively. On almost all outputs, the brute-force approach is just as good, if not better, than the officer-guided approach. No relationship was observed between officer rank and the predictive power of the combination of features they selected. Different outputs require different sets of features, and there is a significant overlap between brute force and officer-guided features in five out of the 10 outputs. Most selected features have a reliable negative impact on prediction performance when perturbed, with some outputs relying on a few critical features and others on a spectrum of features. Finally, a single set of features can predict most outputs almost as well as output-specific features.

查看原文本刊更多论文

预测网络犯罪特征的机器学习模型

本研究旨在利用投诉人提供的最相关资料，预测个别网络犯罪案件的调查结果。我们整理了科威特网络犯罪打击部门（3CD）从2019年到2022年解决的黑客案件的数据集。每宗个案都有一组由投诉人提供的资料（输入特征），以及一组相应的调查结果（输出）。对于每个输出，通过嵌套的5倍交叉验证来评估几个机器学习模型，如决策树和前馈神经网络，以衡量它们在给定输入特征的情况下预测输出的效果。输入特征集要么从所有可能的输入特征组合中选择（蛮力），要么从一组有限的军官提供的组合中选择（军官指导）。最后，对结果进行事后分析，以确定一组可用于为所有收集的输出构建合理预测模型的特征。根据输出，蛮力和军官引导的方法分别比基线模型和最差军官得分有92%和53%的中位数相对优势。在几乎所有的输出中，暴力方法与军官指导的方法一样好，如果不是更好的话。没有观察到军官等级和他们选择的特征组合的预测能力之间的关系。不同的输出需要不同的特征集，在10个输出中，有5个在蛮力和军官指导的特征之间存在显著的重叠。当受到干扰时，大多数选择的特征对预测性能有可靠的负面影响，一些输出依赖于几个关键特征，而另一些则依赖于一系列特征。最后，一组特征几乎可以预测大多数输出以及特定于输出的特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Kuwait Journal of Science MULTIDISCIPLINARY SCIENCES-

CiteScore

1.60

自引率

28.60%

发文量

132

期刊介绍： Kuwait Journal of Science (KJS) is indexed and abstracted by major publishing houses such as Chemical Abstract, Science Citation Index, Current contents, Mathematics Abstract, Micribiological Abstracts etc. KJS publishes peer-review articles in various fields of Science including Mathematics, Computer Science, Physics, Statistics, Biology, Chemistry and Earth & Environmental Sciences. In addition, it also aims to bring the results of scientific research carried out under a variety of intellectual traditions and organizations to the attention of specialized scholarly readership. As such, the publisher expects the submission of original manuscripts which contain analysis and solutions about important theoretical, empirical and normative issues.