用于基于机器学习的恶意软件检测的系统事件日志

Riki Mi’roj Achmad, Dyah Putri Nariswari, Baskoro Adi Pratomo, Hudan Studiawan
{"title":"用于基于机器学习的恶意软件检测的系统事件日志","authors":"Riki Mi’roj Achmad,&nbsp;Dyah Putri Nariswari,&nbsp;Baskoro Adi Pratomo,&nbsp;Hudan Studiawan","doi":"10.1016/j.csa.2025.100110","DOIUrl":null,"url":null,"abstract":"<div><div>Malware poses a significant threat to modern computing environments, necessitating advanced detection techniques that can adapt to evolving attack methods. This study focuses on dynamic malware analysis using machine learning models to process detailed data from Sysmon Event Logs, a crucial sources of system information that record both running program activities. Sysmon events contain various information on what a program is doing during execution, such as created processes, initiated network connection, DNS queries, modified file and registry keys, and other type of events. Such information can be used to classify malicious or benign software. In this research, we employed various machine learning algorithms, both classification (supervised learning) and outlier detection (unsupervised learning) approaches, such as Naive Bayes, Decision Tree, Random Forest, Support Vector Machine (SVM) for supervised learning, and Isolation Forest, Local Outlier Factor (LOF), and One-Class SVM for unsupervised learning. An extensive set of experiment were conducted to look for the best approach and the most relevant features. Principal Component Analysis (PCA) was applied to select the most relevant features for both supervised and unsupervised learning models. The experiments showed that the Local Outlier Factor (LOF) model with its twenty best features achieved the best performance, with an F1 score of 0.9873.</div></div>","PeriodicalId":100351,"journal":{"name":"Cyber Security and Applications","volume":"3 ","pages":"Article 100110"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sysmon event logs for machine learning-based malware detection\",\"authors\":\"Riki Mi’roj Achmad,&nbsp;Dyah Putri Nariswari,&nbsp;Baskoro Adi Pratomo,&nbsp;Hudan Studiawan\",\"doi\":\"10.1016/j.csa.2025.100110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Malware poses a significant threat to modern computing environments, necessitating advanced detection techniques that can adapt to evolving attack methods. This study focuses on dynamic malware analysis using machine learning models to process detailed data from Sysmon Event Logs, a crucial sources of system information that record both running program activities. Sysmon events contain various information on what a program is doing during execution, such as created processes, initiated network connection, DNS queries, modified file and registry keys, and other type of events. Such information can be used to classify malicious or benign software. In this research, we employed various machine learning algorithms, both classification (supervised learning) and outlier detection (unsupervised learning) approaches, such as Naive Bayes, Decision Tree, Random Forest, Support Vector Machine (SVM) for supervised learning, and Isolation Forest, Local Outlier Factor (LOF), and One-Class SVM for unsupervised learning. An extensive set of experiment were conducted to look for the best approach and the most relevant features. Principal Component Analysis (PCA) was applied to select the most relevant features for both supervised and unsupervised learning models. The experiments showed that the Local Outlier Factor (LOF) model with its twenty best features achieved the best performance, with an F1 score of 0.9873.</div></div>\",\"PeriodicalId\":100351,\"journal\":{\"name\":\"Cyber Security and Applications\",\"volume\":\"3 \",\"pages\":\"Article 100110\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cyber Security and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S277291842500027X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cyber Security and Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S277291842500027X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

恶意软件对现代计算环境构成了重大威胁,需要先进的检测技术来适应不断发展的攻击方法。本研究侧重于动态恶意软件分析,使用机器学习模型来处理来自Sysmon事件日志的详细数据,Sysmon事件日志是记录正在运行的程序活动的系统信息的重要来源。Sysmon事件包含关于程序在执行期间所做的事情的各种信息,例如创建的进程、启动的网络连接、DNS查询、修改的文件和注册表项以及其他类型的事件。这些信息可用于对恶意或良性软件进行分类。在本研究中,我们采用了各种机器学习算法,包括分类(监督学习)和离群点检测(无监督学习)方法,例如用于监督学习的朴素贝叶斯、决策树、随机森林、支持向量机(SVM),以及用于无监督学习的隔离森林、局部离群点因子(LOF)和一类支持向量机。为了寻找最佳方法和最相关的特征,我们进行了一系列广泛的实验。应用主成分分析(PCA)为监督学习和无监督学习模型选择最相关的特征。实验表明,具有20个最佳特征的局部离群因子(LOF)模型获得了最佳性能,F1得分为0.9873。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sysmon event logs for machine learning-based malware detection
Malware poses a significant threat to modern computing environments, necessitating advanced detection techniques that can adapt to evolving attack methods. This study focuses on dynamic malware analysis using machine learning models to process detailed data from Sysmon Event Logs, a crucial sources of system information that record both running program activities. Sysmon events contain various information on what a program is doing during execution, such as created processes, initiated network connection, DNS queries, modified file and registry keys, and other type of events. Such information can be used to classify malicious or benign software. In this research, we employed various machine learning algorithms, both classification (supervised learning) and outlier detection (unsupervised learning) approaches, such as Naive Bayes, Decision Tree, Random Forest, Support Vector Machine (SVM) for supervised learning, and Isolation Forest, Local Outlier Factor (LOF), and One-Class SVM for unsupervised learning. An extensive set of experiment were conducted to look for the best approach and the most relevant features. Principal Component Analysis (PCA) was applied to select the most relevant features for both supervised and unsupervised learning models. The experiments showed that the Local Outlier Factor (LOF) model with its twenty best features achieved the best performance, with an F1 score of 0.9873.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信