EEMDS: Efficient and Effective Malware Detection System with Hybrid Model based on XceptionCNN and LightGBM Algorithm

Journal of Computing and Social Informatics Pub Date : 2022-10-28 DOI:10.33736/jcsi.4739.2022

M. Onoja, Abayomi Jegede, N. Blamah, Abinbola Victor Olawale, T. O. Omotehinwa

{"title":"EEMDS: Efficient and Effective Malware Detection System with Hybrid Model based on XceptionCNN and LightGBM Algorithm","authors":"M. Onoja, Abayomi Jegede, N. Blamah, Abinbola Victor Olawale, T. O. Omotehinwa","doi":"10.33736/jcsi.4739.2022","DOIUrl":null,"url":null,"abstract":"The security threats posed by malware make it imperative to build a model for efficient and effective classification of malware based on its family, irrespective of the variant. Preliminary experiments carried out demonstrate the suitability of the generic LightGBM algorithm for Windows malware as well as its effectiveness and efficiency in terms of detection accuracy, training accuracy, prediction time and training time. The prediction time of the generic LightGBM is 0.08s for binary class and 0.40s for multi-class on the Malimg dataset. The classification accuracy of the generic LightGBM is 99% True Positive Rate (TPR). Its training accuracy is 99.80% for binary class and 96.87% for multi-class, while the training time is 179.51s and 2224.77s for binary and multi classification respectively. The performance of the generic LightGBM leaves room for improvement, hence, the need to improve the classification accuracy and training accuracy of the model for effective decision making and to reduce the prediction time and training time for efficiency. It is also imperative to improve the performance and accuracy for effectiveness on larger samples. The goal is to enhance the detection accuracy and reduce the prediction time. The reduction in prediction time provides early detection of malware before it damages files stored in computer systems. Performance evaluation based on Malimg dataset demonstrates the effectiveness and efficiency of the hybrid model. The proposed model is a hybrid model which integrates XceptionCNN with LightGBM algorithm for Windows Malware classification on google colab environment. It uses the Malimg malware dataset which is a benchmark dataset for Windows malware image classification. It contains 9,339 Malware samples, structured as grayscale images, consisting of 25 families and 1,042 Windows benign executable files extracted from Windows environments. The proposed XceptionCNN-LightGBM technique provides improved classification accuracy of 100% TPR, with an overall reduction in the prediction time of 0.08s and 0.37s for binary and multi-class respectively. These are lower than the prediction time for the generic LightGBM which is 0.08s for binary class and 0.40s for multi-class, with an improved 100% classification accuracy. The training accuracy increased to 99.85% for binary classification and 97.40% for multi classification, with reduction in the training time of 29.97s for binary classification and 447.75s for multi classification. These are also lower than the training times for the generic LightGBM model, which are 179.51s and 2224.77s for the binary and multi classification respectively. This significant reduction in the training time makes it possible for the model to converge quickly and train a large sum of data within a relatively short period of time. Overall, the reduction in detection time and improvement in detection accuracy will minimize damages to files stored in computer systems in the event of malware attack.","PeriodicalId":177345,"journal":{"name":"Journal of Computing and Social Informatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computing and Social Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33736/jcsi.4739.2022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The security threats posed by malware make it imperative to build a model for efficient and effective classification of malware based on its family, irrespective of the variant. Preliminary experiments carried out demonstrate the suitability of the generic LightGBM algorithm for Windows malware as well as its effectiveness and efficiency in terms of detection accuracy, training accuracy, prediction time and training time. The prediction time of the generic LightGBM is 0.08s for binary class and 0.40s for multi-class on the Malimg dataset. The classification accuracy of the generic LightGBM is 99% True Positive Rate (TPR). Its training accuracy is 99.80% for binary class and 96.87% for multi-class, while the training time is 179.51s and 2224.77s for binary and multi classification respectively. The performance of the generic LightGBM leaves room for improvement, hence, the need to improve the classification accuracy and training accuracy of the model for effective decision making and to reduce the prediction time and training time for efficiency. It is also imperative to improve the performance and accuracy for effectiveness on larger samples. The goal is to enhance the detection accuracy and reduce the prediction time. The reduction in prediction time provides early detection of malware before it damages files stored in computer systems. Performance evaluation based on Malimg dataset demonstrates the effectiveness and efficiency of the hybrid model. The proposed model is a hybrid model which integrates XceptionCNN with LightGBM algorithm for Windows Malware classification on google colab environment. It uses the Malimg malware dataset which is a benchmark dataset for Windows malware image classification. It contains 9,339 Malware samples, structured as grayscale images, consisting of 25 families and 1,042 Windows benign executable files extracted from Windows environments. The proposed XceptionCNN-LightGBM technique provides improved classification accuracy of 100% TPR, with an overall reduction in the prediction time of 0.08s and 0.37s for binary and multi-class respectively. These are lower than the prediction time for the generic LightGBM which is 0.08s for binary class and 0.40s for multi-class, with an improved 100% classification accuracy. The training accuracy increased to 99.85% for binary classification and 97.40% for multi classification, with reduction in the training time of 29.97s for binary classification and 447.75s for multi classification. These are also lower than the training times for the generic LightGBM model, which are 179.51s and 2224.77s for the binary and multi classification respectively. This significant reduction in the training time makes it possible for the model to converge quickly and train a large sum of data within a relatively short period of time. Overall, the reduction in detection time and improvement in detection accuracy will minimize damages to files stored in computer systems in the event of malware attack.

查看原文本刊更多论文

基于XceptionCNN和LightGBM算法的高效混合模型恶意软件检测系统

恶意软件带来的安全威胁使得建立一个基于其家族的、有效的恶意软件分类模型势在必行，而不考虑其变体。初步实验证明了通用LightGBM算法对Windows恶意软件的适用性，以及在检测精度、训练精度、预测时间和训练时间方面的有效性和高效性。在Malimg数据集上，通用LightGBM对二分类的预测时间为0.08s，对多分类的预测时间为0.40s。通用LightGBM的分类准确率为99%的真阳性率(TPR)。二分类和多分类的训练准确率分别为99.80%和96.87%，二分类和多分类的训练时间分别为179.51s和2224.77s。由于通用的LightGBM的性能还有改进的空间，因此需要提高模型的分类精度和训练精度，以实现有效的决策，减少预测时间和训练时间，以提高效率。为了在更大的样本上有效，提高性能和准确性也是势在必行的。目标是提高检测精度，减少预测时间。减少预测时间可以在恶意软件损坏存储在计算机系统中的文件之前早期检测到它。基于Malimg数据集的性能评估验证了混合模型的有效性和效率。该模型是将XceptionCNN与LightGBM算法相结合的混合模型，用于google协作环境下的Windows恶意软件分类。它使用Malimg恶意软件数据集，这是Windows恶意软件图像分类的基准数据集。它包含9339个恶意软件样本，结构为灰度图像，包括25个家族和1042个从Windows环境中提取的Windows良性可执行文件。提出的XceptionCNN-LightGBM技术提供了100% TPR的分类精度，二元和多类的预测时间分别减少了0.08s和0.37s。这比通用LightGBM的预测时间(二元分类为0.08s，多类为0.40s)要低，分类准确率提高了100%。二分类和多分类的训练准确率分别提高到99.85%和97.40%，二分类和多分类的训练时间分别减少了29.97s和447.75s。这也低于通用LightGBM模型的训练时间，二元分类和多重分类的训练时间分别为179.51秒和2224.77秒。这种训练时间的显著减少使得模型能够快速收敛，并在相对较短的时间内训练大量数据。总的来说，减少检测时间和提高检测精度将最大限度地减少恶意软件攻击时对计算机系统中存储的文件的损害。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computing and Social Informatics

自引率

0.00%

发文量