Heba Harahsheh, M. Alshraideh, S. Al-Sharaeh, R. Al-Sayyed
{"title":"Improving Classification Performance for Malware Detection Using Genetic Programming Feature Selection Techniques","authors":"Heba Harahsheh, M. Alshraideh, S. Al-Sharaeh, R. Al-Sayyed","doi":"10.1080/19361610.2022.2067459","DOIUrl":null,"url":null,"abstract":"Abstract Malware is the term used to describe any malicious software or code that is harmful to systems. From day to day, new malicious programs appear. To classify malware according to its characteristics, machine learning is now being used; this is because most new malware contains patterns that are similar to old ones. This paper proposes two feature selection methods based on Genetic Programming (GP) for predicting malware; the first is called Genetic Programming-Mean (GPM), and the second is called Genetic Programming-Mean Plus (GPMP). The results of these two methods were compared with three state-of-the-art popular feature selection techniques: filter-based, wrapper-based, and Chi-square. In this work, we compare the two proposed methods (GPM and GPMP) with these three widely used feature selection techniques. The results demonstrate that the proposed techniques beat these state-of-the-art ones in terms of accuracy and F-score. The results also revealed that the proposed methods employed less computation time and hence an enhanced performance when compared with filter-based, and wrapper-based feature selection. The proposed methods were evaluated using four datasets. Two classifiers were used to evaluate the proposed feature selection methods: Random Forest and Decision Tree. When a Random Forest classifier is used, our results showed that it outperformed the Decision Tree classifier in indicators, such as F1-score, recall, and precision. The analysis of results using Random Forest and Decision Tree proves that the proposed method is highly efficient.","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/19361610.2022.2067459","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract Malware is the term used to describe any malicious software or code that is harmful to systems. From day to day, new malicious programs appear. To classify malware according to its characteristics, machine learning is now being used; this is because most new malware contains patterns that are similar to old ones. This paper proposes two feature selection methods based on Genetic Programming (GP) for predicting malware; the first is called Genetic Programming-Mean (GPM), and the second is called Genetic Programming-Mean Plus (GPMP). The results of these two methods were compared with three state-of-the-art popular feature selection techniques: filter-based, wrapper-based, and Chi-square. In this work, we compare the two proposed methods (GPM and GPMP) with these three widely used feature selection techniques. The results demonstrate that the proposed techniques beat these state-of-the-art ones in terms of accuracy and F-score. The results also revealed that the proposed methods employed less computation time and hence an enhanced performance when compared with filter-based, and wrapper-based feature selection. The proposed methods were evaluated using four datasets. Two classifiers were used to evaluate the proposed feature selection methods: Random Forest and Decision Tree. When a Random Forest classifier is used, our results showed that it outperformed the Decision Tree classifier in indicators, such as F1-score, recall, and precision. The analysis of results using Random Forest and Decision Tree proves that the proposed method is highly efficient.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.