基于简单机器学习算法的恶意软件检测特征选择探索性分析

IF 0.7 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Communications Software and Systems Pub Date : 2023-01-01 DOI:10.24138/jcomss-2023-0091

Md Ashikur Rahman, Syful Islam, Yusuf Sulistyo Nugroho, Fatah Yasin Al Irsyadi, Md Javed Hossain

{"title":"基于简单机器学习算法的恶意软件检测特征选择探索性分析","authors":"Md Ashikur Rahman, Syful Islam, Yusuf Sulistyo Nugroho, Fatah Yasin Al Irsyadi, Md Javed Hossain","doi":"10.24138/jcomss-2023-0091","DOIUrl":null,"url":null,"abstract":"Computers have become increasingly vulnerable to malicious attacks with an increase in popularity and the proliferation of open system architectures. There are numerous malware detection technologies available to protect the computer operating system from such attacks. This type of malware detector targets programs based on patterns detected in the properties of computer applications. As the amount of analytical data increases, the computer defense system is adversely affected. The performance of the detection mechanism has been hindered due to the presence of numerous irrelevant characteristics. The goal of this study is to provide a feature selection approach that will help malware detection systems be more accurate by detecting pertinent and significant traits. Furthermore, by selecting the most important features, it is possible to maintain an acceptable level of accuracy in the detection of malware while significantly lowering the computational cost. The proposed method displays the most important features (MIFs) obtained from each machine learning method, including data cleaning and feature selection. Furthermore, the method applies six machine learning classification techniques to the selected feature set. Several classifiers were evaluated based on several characteristics for malware detection, including Support Vector Machines (SVM), Logistic Regression (LR), K-nearest neighbor (K-NN), Decision Tree (DT), Naive Bayes (NB), and Random Forest (RF). Our suggested model was tested on two malware datasets to determine its effectiveness. In terms of accuracy, precision, F1 scores, and recall, the experimental findings show that RF and DT classifiers beat other techniques.","PeriodicalId":38910,"journal":{"name":"Journal of Communications Software and Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Exploratory Analysis of Feature Selection for Malware Detection with Simple Machine Learning Algorithms\",\"authors\":\"Md Ashikur Rahman, Syful Islam, Yusuf Sulistyo Nugroho, Fatah Yasin Al Irsyadi, Md Javed Hossain\",\"doi\":\"10.24138/jcomss-2023-0091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computers have become increasingly vulnerable to malicious attacks with an increase in popularity and the proliferation of open system architectures. There are numerous malware detection technologies available to protect the computer operating system from such attacks. This type of malware detector targets programs based on patterns detected in the properties of computer applications. As the amount of analytical data increases, the computer defense system is adversely affected. The performance of the detection mechanism has been hindered due to the presence of numerous irrelevant characteristics. The goal of this study is to provide a feature selection approach that will help malware detection systems be more accurate by detecting pertinent and significant traits. Furthermore, by selecting the most important features, it is possible to maintain an acceptable level of accuracy in the detection of malware while significantly lowering the computational cost. The proposed method displays the most important features (MIFs) obtained from each machine learning method, including data cleaning and feature selection. Furthermore, the method applies six machine learning classification techniques to the selected feature set. Several classifiers were evaluated based on several characteristics for malware detection, including Support Vector Machines (SVM), Logistic Regression (LR), K-nearest neighbor (K-NN), Decision Tree (DT), Naive Bayes (NB), and Random Forest (RF). Our suggested model was tested on two malware datasets to determine its effectiveness. In terms of accuracy, precision, F1 scores, and recall, the experimental findings show that RF and DT classifiers beat other techniques.\",\"PeriodicalId\":38910,\"journal\":{\"name\":\"Journal of Communications Software and Systems\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Communications Software and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24138/jcomss-2023-0091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Communications Software and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24138/jcomss-2023-0091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着开放系统架构的普及和扩散，计算机越来越容易受到恶意攻击。有许多恶意软件检测技术可用于保护计算机操作系统免受此类攻击。这种类型的恶意软件检测器基于在计算机应用程序的属性中检测到的模式来针对程序。随着分析数据量的增加，计算机防御系统受到不利影响。由于存在许多不相关的特征，检测机制的性能受到阻碍。本研究的目的是提供一种特征选择方法，通过检测相关和重要的特征，帮助恶意软件检测系统更加准确。此外，通过选择最重要的特征，可以在检测恶意软件时保持可接受的准确性水平，同时显着降低计算成本。该方法显示了从每种机器学习方法中获得的最重要特征(MIFs)，包括数据清洗和特征选择。此外，该方法将六种机器学习分类技术应用于选定的特征集。基于恶意软件检测的几个特征评估了几种分类器，包括支持向量机(SVM)、逻辑回归(LR)、k近邻(K-NN)、决策树(DT)、朴素贝叶斯(NB)和随机森林(RF)。我们提出的模型在两个恶意软件数据集上进行了测试，以确定其有效性。在准确性、精密度、F1分数和召回率方面，实验结果表明RF和DT分类器优于其他技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Exploratory Analysis of Feature Selection for Malware Detection with Simple Machine Learning Algorithms

Computers have become increasingly vulnerable to malicious attacks with an increase in popularity and the proliferation of open system architectures. There are numerous malware detection technologies available to protect the computer operating system from such attacks. This type of malware detector targets programs based on patterns detected in the properties of computer applications. As the amount of analytical data increases, the computer defense system is adversely affected. The performance of the detection mechanism has been hindered due to the presence of numerous irrelevant characteristics. The goal of this study is to provide a feature selection approach that will help malware detection systems be more accurate by detecting pertinent and significant traits. Furthermore, by selecting the most important features, it is possible to maintain an acceptable level of accuracy in the detection of malware while significantly lowering the computational cost. The proposed method displays the most important features (MIFs) obtained from each machine learning method, including data cleaning and feature selection. Furthermore, the method applies six machine learning classification techniques to the selected feature set. Several classifiers were evaluated based on several characteristics for malware detection, including Support Vector Machines (SVM), Logistic Regression (LR), K-nearest neighbor (K-NN), Decision Tree (DT), Naive Bayes (NB), and Random Forest (RF). Our suggested model was tested on two malware datasets to determine its effectiveness. In terms of accuracy, precision, F1 scores, and recall, the experimental findings show that RF and DT classifiers beat other techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Communications Software and Systems Engineering-Electrical and Electronic Engineering

CiteScore

2.00

自引率

14.30%

发文量

审稿时长

8 weeks