Enhancing Forensic Analysis Using a Machine Learning-based Approach

Samira Benkerroum, Khalid Chougdali
{"title":"Enhancing Forensic Analysis Using a Machine Learning-based Approach","authors":"Samira Benkerroum, Khalid Chougdali","doi":"10.1109/CommNet60167.2023.10365260","DOIUrl":null,"url":null,"abstract":"In recent years, computers or digital devices contribute to the global spread of cyber threats and cyber crimes. These cyberattacks leave some artefacts on the storage of the target device, for this reason they require special treatment, and which will have to be the subject of various investigations in order to study its behavior and analyze and prevent it so that this never happen again.Despite the continued development of digital forensic investigations for the recovery of evidence whether volatile or non-volatile, manual investigations are both time-intensive and laborious. The proposed solution is to use a method to automate manual forensic investigation tasks (forensic analysis) to reduce human effort and improve time efficiency.This paper presents a summary of the digital forensic investigation process, we discuss existing ML solutions to automate the analysis process.Finally, the paper proposes an approach based on machine learning where the binary classification was performed using the algorithms K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, Decision Tree, Logistic Regression, Gradient Boosted Tree, Multi-Layer Perceptron, using CIC-MalMem-2022 dataset to identify malware.The algorithms’ respective performances were contrasted. The performance metrics Precision, F1-score, Accuracy, Recall, and Area Under the Curve were used to assess the outcomes. Consequently, the Random Forest and Gradient Boosted Tree algorithms demonstrated superior performance, achieving a remarkable accuracy level of 99.98% in the detection of malware through memory scans. The Logistic Regression algorithm exhibited the least favorable performance in analyzing malware using memory data, achieving an accuracy rate of 95.75%. According to the results obtained, many algorithms used have obtained very satisfactory results.","PeriodicalId":505542,"journal":{"name":"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)","volume":"19 4","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CommNet60167.2023.10365260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, computers or digital devices contribute to the global spread of cyber threats and cyber crimes. These cyberattacks leave some artefacts on the storage of the target device, for this reason they require special treatment, and which will have to be the subject of various investigations in order to study its behavior and analyze and prevent it so that this never happen again.Despite the continued development of digital forensic investigations for the recovery of evidence whether volatile or non-volatile, manual investigations are both time-intensive and laborious. The proposed solution is to use a method to automate manual forensic investigation tasks (forensic analysis) to reduce human effort and improve time efficiency.This paper presents a summary of the digital forensic investigation process, we discuss existing ML solutions to automate the analysis process.Finally, the paper proposes an approach based on machine learning where the binary classification was performed using the algorithms K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, Decision Tree, Logistic Regression, Gradient Boosted Tree, Multi-Layer Perceptron, using CIC-MalMem-2022 dataset to identify malware.The algorithms’ respective performances were contrasted. The performance metrics Precision, F1-score, Accuracy, Recall, and Area Under the Curve were used to assess the outcomes. Consequently, the Random Forest and Gradient Boosted Tree algorithms demonstrated superior performance, achieving a remarkable accuracy level of 99.98% in the detection of malware through memory scans. The Logistic Regression algorithm exhibited the least favorable performance in analyzing malware using memory data, achieving an accuracy rate of 95.75%. According to the results obtained, many algorithms used have obtained very satisfactory results.
使用基于机器学习的方法加强法证分析
近年来,计算机或数字设备助长了网络威胁和网络犯罪在全球的蔓延。这些网络攻击会在目标设备的存储设备上留下一些人工痕迹,因此需要对其进行特殊处理,并对其进行各种调查,以研究其行为,分析并预防其发生,从而避免此类事件再次发生。尽管数字取证调查在恢复易失性或非易失性证据方面不断发展,但人工调查既费时又费力。本文概述了数字取证调查过程,讨论了现有的 ML 解决方案,以实现分析过程的自动化。最后,本文提出了一种基于机器学习的方法,即使用 K-Nearest Neighbors、Naive Bayes、Random Forest、Support Vector Machine、Decision Tree、Logistic Regression、Gradient Boosted Tree、Multi-Layer Perceptron 等算法进行二元分类,使用 CIC-MalMem-2022 数据集来识别恶意软件。使用精度、F1 分数、准确率、召回率和曲线下面积等性能指标来评估结果。结果显示,随机森林算法和梯度提升树算法表现优异,在通过内存扫描检测恶意软件方面达到了 99.98% 的出色准确率水平。逻辑回归算法在利用内存数据分析恶意软件方面表现最差,准确率仅为 95.75%。从获得的结果来看,所使用的许多算法都取得了非常令人满意的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信