Improving Classification Performance for Malware Detection Using Genetic Programming Feature Selection Techniques

IF 1.1 Q3 CRIMINOLOGY & PENOLOGY
Heba Harahsheh, M. Alshraideh, S. Al-Sharaeh, R. Al-Sayyed
{"title":"Improving Classification Performance for Malware Detection Using Genetic Programming Feature Selection Techniques","authors":"Heba Harahsheh, M. Alshraideh, S. Al-Sharaeh, R. Al-Sayyed","doi":"10.1080/19361610.2022.2067459","DOIUrl":null,"url":null,"abstract":"Abstract Malware is the term used to describe any malicious software or code that is harmful to systems. From day to day, new malicious programs appear. To classify malware according to its characteristics, machine learning is now being used; this is because most new malware contains patterns that are similar to old ones. This paper proposes two feature selection methods based on Genetic Programming (GP) for predicting malware; the first is called Genetic Programming-Mean (GPM), and the second is called Genetic Programming-Mean Plus (GPMP). The results of these two methods were compared with three state-of-the-art popular feature selection techniques: filter-based, wrapper-based, and Chi-square. In this work, we compare the two proposed methods (GPM and GPMP) with these three widely used feature selection techniques. The results demonstrate that the proposed techniques beat these state-of-the-art ones in terms of accuracy and F-score. The results also revealed that the proposed methods employed less computation time and hence an enhanced performance when compared with filter-based, and wrapper-based feature selection. The proposed methods were evaluated using four datasets. Two classifiers were used to evaluate the proposed feature selection methods: Random Forest and Decision Tree. When a Random Forest classifier is used, our results showed that it outperformed the Decision Tree classifier in indicators, such as F1-score, recall, and precision. The analysis of results using Random Forest and Decision Tree proves that the proposed method is highly efficient.","PeriodicalId":44585,"journal":{"name":"Journal of Applied Security Research","volume":"18 1","pages":"627 - 647"},"PeriodicalIF":1.1000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Security Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/19361610.2022.2067459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CRIMINOLOGY & PENOLOGY","Score":null,"Total":0}
引用次数: 1

Abstract

Abstract Malware is the term used to describe any malicious software or code that is harmful to systems. From day to day, new malicious programs appear. To classify malware according to its characteristics, machine learning is now being used; this is because most new malware contains patterns that are similar to old ones. This paper proposes two feature selection methods based on Genetic Programming (GP) for predicting malware; the first is called Genetic Programming-Mean (GPM), and the second is called Genetic Programming-Mean Plus (GPMP). The results of these two methods were compared with three state-of-the-art popular feature selection techniques: filter-based, wrapper-based, and Chi-square. In this work, we compare the two proposed methods (GPM and GPMP) with these three widely used feature selection techniques. The results demonstrate that the proposed techniques beat these state-of-the-art ones in terms of accuracy and F-score. The results also revealed that the proposed methods employed less computation time and hence an enhanced performance when compared with filter-based, and wrapper-based feature selection. The proposed methods were evaluated using four datasets. Two classifiers were used to evaluate the proposed feature selection methods: Random Forest and Decision Tree. When a Random Forest classifier is used, our results showed that it outperformed the Decision Tree classifier in indicators, such as F1-score, recall, and precision. The analysis of results using Random Forest and Decision Tree proves that the proposed method is highly efficient.
利用遗传规划特征选择技术改进恶意软件检测的分类性能
摘要恶意软件是用来描述对系统有害的任何恶意软件或代码的术语。每天都有新的恶意程序出现。为了根据恶意软件的特征对其进行分类,现在正在使用机器学习;这是因为大多数新的恶意软件都包含与旧的相似的模式。本文提出了两种基于遗传规划的恶意软件特征选择方法;第一种称为遗传规划均值(GPM),第二种称为基因规划均值加(GPMP)。将这两种方法的结果与三种最先进的流行特征选择技术进行了比较:基于过滤器、基于包装器和卡方。在这项工作中,我们将提出的两种方法(GPM和GPMP)与这三种广泛使用的特征选择技术进行了比较。结果表明,所提出的技术在准确性和F分数方面优于这些最先进的技术。结果还表明,与基于滤波器和基于包装器的特征选择相比,所提出的方法使用了更少的计算时间,从而提高了性能。使用四个数据集对所提出的方法进行了评估。使用两个分类器来评估所提出的特征选择方法:随机森林和决策树。当使用随机森林分类器时,我们的结果表明,它在F1分数、召回率和精度等指标上优于决策树分类器。使用随机森林和决策树对结果进行分析,证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Applied Security Research
Journal of Applied Security Research CRIMINOLOGY & PENOLOGY-
CiteScore
2.90
自引率
15.40%
发文量
35
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信