Franklin Parrales-Bravo, Joel Torres-Urresto, Dayannara Avila-Maldonado, Julio Barzola-Monteses
{"title":"Relevant and Non-Redundant Feature Subset Selection Applied to the Detection of Malware in a Network","authors":"Franklin Parrales-Bravo, Joel Torres-Urresto, Dayannara Avila-Maldonado, Julio Barzola-Monteses","doi":"10.1109/ETCM53643.2021.9590777","DOIUrl":null,"url":null,"abstract":"Removing redundant features is one of the goals addressed by the feature subset selection techniques (FSS). According to some studies, the selection of non-redundant features is not guaranteed when using only a filter or a wrapper FSS approach. Thus, the aim of this research is to present a methodology to train intrusion detection models that considers a combination of filter and wrapper FSS techniques to guarantee the selection of non-redundant attributes in the data pre-processing phase. To test the effectiveness of the proposed technique, the accuracy of the trained models with the features selected by the proposed technique was evaluated on a set of malware detection data. The classifying algorithms selected for training the malware-detection models were: i) Random Forest, ii) C4.5, iii) Adaboost, iv) Gradient boosting. Based on the accuracy metric, the malware detection model that obtained the best results was the one trained with the RandomForest algorithm. This model achieved an average of 99.42% accuracy when using the proposed feature selection technique, improving by 0.10% the accuracy of the model trained with the same algorithm, but without the use of the proposed methodology. Therefore, we can conclude that the models trained with the proposed methodology provide similar results to the models that do not use it, having the advantage of removing all redundant features from the dataset.","PeriodicalId":438567,"journal":{"name":"2021 IEEE Fifth Ecuador Technical Chapters Meeting (ETCM)","volume":"180 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Fifth Ecuador Technical Chapters Meeting (ETCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETCM53643.2021.9590777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Removing redundant features is one of the goals addressed by the feature subset selection techniques (FSS). According to some studies, the selection of non-redundant features is not guaranteed when using only a filter or a wrapper FSS approach. Thus, the aim of this research is to present a methodology to train intrusion detection models that considers a combination of filter and wrapper FSS techniques to guarantee the selection of non-redundant attributes in the data pre-processing phase. To test the effectiveness of the proposed technique, the accuracy of the trained models with the features selected by the proposed technique was evaluated on a set of malware detection data. The classifying algorithms selected for training the malware-detection models were: i) Random Forest, ii) C4.5, iii) Adaboost, iv) Gradient boosting. Based on the accuracy metric, the malware detection model that obtained the best results was the one trained with the RandomForest algorithm. This model achieved an average of 99.42% accuracy when using the proposed feature selection technique, improving by 0.10% the accuracy of the model trained with the same algorithm, but without the use of the proposed methodology. Therefore, we can conclude that the models trained with the proposed methodology provide similar results to the models that do not use it, having the advantage of removing all redundant features from the dataset.