利用顺序特征提取和分类的深度混合方法进行稳健的恶意软件检测

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Egyptian Informatics Journal Pub Date : 2024-09-01 DOI:10.1016/j.eij.2024.100539

Swapnil Singh , Deepa Krishnan , Vidhi Vazirani , Vinayakumar Ravi , Suliman A. Alsuhibany

{"title":"利用顺序特征提取和分类的深度混合方法进行稳健的恶意软件检测","authors":"Swapnil Singh , Deepa Krishnan , Vidhi Vazirani , Vinayakumar Ravi , Suliman A. Alsuhibany","doi":"10.1016/j.eij.2024.100539","DOIUrl":null,"url":null,"abstract":"<div><p>Malware attacks have escalated significantly with an increase in the number of internet users and connected devices. With the increasingly different types of malware released by hackers, designing new and competitive techniques to detect advanced malware is essential. In the proposed research, we have developed a multi-level feature extraction technique using deep learning architectures and a classification model to classify malware families. The essential features from the malware images are extracted using the Gated Recurrent Unit in the first step, which are further fed to a Convolutional Neural Network model for extracting the final feature vector. The multi-level feature selection is followed by classification into various malware families using Cost-sensitive Boot Strapped Weighted Random Forest (CSBW-RF). The proposed approach gave promising results of 99.58 % accuracy in distinguishing the 25 different malware families on the Mallmg dataset. This hybrid model gave significantly better performance scores for classifying visually similar malware families. The generalizability of the proposed model is benchmarked with the popular Microsoft Big 2015 dataset and has achieved comparatively higher performance scores than many existing models. This benchmarking demonstrates the robustness and scalability of our approach. The use of cost-sensitive learning and bootstrapping techniques also contributed to the model’s ability to generalize well to new and unseen data. These enhancements ensure that our model can be effectively applied in diverse real-world scenarios, maintaining high performance across different environments and malware types. This research can contribute to detecting malware attacks and can be integrated in threat monitoring systems. The successful application of this hybrid model indicates its potential for deployment in real-world cybersecurity environments, providing a strong defense against evolving malware threats.</p></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"27 ","pages":"Article 100539"},"PeriodicalIF":5.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110866524001026/pdfft?md5=9a6497ba22f60fe6be5116413f7890b0&pid=1-s2.0-S1110866524001026-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Deep hybrid approach with sequential feature extraction and classification for robust malware detection\",\"authors\":\"Swapnil Singh , Deepa Krishnan , Vidhi Vazirani , Vinayakumar Ravi , Suliman A. Alsuhibany\",\"doi\":\"10.1016/j.eij.2024.100539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Malware attacks have escalated significantly with an increase in the number of internet users and connected devices. With the increasingly different types of malware released by hackers, designing new and competitive techniques to detect advanced malware is essential. In the proposed research, we have developed a multi-level feature extraction technique using deep learning architectures and a classification model to classify malware families. The essential features from the malware images are extracted using the Gated Recurrent Unit in the first step, which are further fed to a Convolutional Neural Network model for extracting the final feature vector. The multi-level feature selection is followed by classification into various malware families using Cost-sensitive Boot Strapped Weighted Random Forest (CSBW-RF). The proposed approach gave promising results of 99.58 % accuracy in distinguishing the 25 different malware families on the Mallmg dataset. This hybrid model gave significantly better performance scores for classifying visually similar malware families. The generalizability of the proposed model is benchmarked with the popular Microsoft Big 2015 dataset and has achieved comparatively higher performance scores than many existing models. This benchmarking demonstrates the robustness and scalability of our approach. The use of cost-sensitive learning and bootstrapping techniques also contributed to the model’s ability to generalize well to new and unseen data. These enhancements ensure that our model can be effectively applied in diverse real-world scenarios, maintaining high performance across different environments and malware types. This research can contribute to detecting malware attacks and can be integrated in threat monitoring systems. The successful application of this hybrid model indicates its potential for deployment in real-world cybersecurity environments, providing a strong defense against evolving malware threats.</p></div>\",\"PeriodicalId\":56010,\"journal\":{\"name\":\"Egyptian Informatics Journal\",\"volume\":\"27 \",\"pages\":\"Article 100539\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1110866524001026/pdfft?md5=9a6497ba22f60fe6be5116413f7890b0&pid=1-s2.0-S1110866524001026-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Egyptian Informatics Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1110866524001026\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866524001026","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着互联网用户和联网设备数量的增加，恶意软件攻击大幅升级。随着黑客发布的恶意软件类型越来越多，设计新的、有竞争力的技术来检测高级恶意软件至关重要。在拟议的研究中，我们利用深度学习架构和分类模型开发了一种多层次特征提取技术，用于对恶意软件家族进行分类。第一步使用门控循环单元从恶意软件图像中提取基本特征，然后将其输入卷积神经网络模型以提取最终特征向量。多级特征选择之后，使用成本敏感引导加权随机森林（CSBW-RF）将恶意软件分类为不同的恶意软件系列。在区分 Mallmg 数据集上的 25 个不同恶意软件家族方面，所提出的方法取得了 99.58% 的准确率，成绩喜人。在对视觉上相似的恶意软件家族进行分类时，这种混合模型的性能得分明显更高。我们使用流行的 Microsoft Big 2015 数据集对所提出模型的通用性进行了基准测试，结果表明该模型的性能得分高于许多现有模型。这一基准测试证明了我们方法的鲁棒性和可扩展性。对成本敏感的学习和引导技术的使用也有助于提高模型对新数据和未见数据的泛化能力。这些改进确保了我们的模型能够有效地应用于现实世界的各种场景，并在不同环境和恶意软件类型中保持高性能。这项研究有助于检测恶意软件攻击，并可集成到威胁监测系统中。这一混合模型的成功应用表明，它有潜力部署到现实世界的网络安全环境中，为应对不断演变的恶意软件威胁提供强有力的防御。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep hybrid approach with sequential feature extraction and classification for robust malware detection

Malware attacks have escalated significantly with an increase in the number of internet users and connected devices. With the increasingly different types of malware released by hackers, designing new and competitive techniques to detect advanced malware is essential. In the proposed research, we have developed a multi-level feature extraction technique using deep learning architectures and a classification model to classify malware families. The essential features from the malware images are extracted using the Gated Recurrent Unit in the first step, which are further fed to a Convolutional Neural Network model for extracting the final feature vector. The multi-level feature selection is followed by classification into various malware families using Cost-sensitive Boot Strapped Weighted Random Forest (CSBW-RF). The proposed approach gave promising results of 99.58 % accuracy in distinguishing the 25 different malware families on the Mallmg dataset. This hybrid model gave significantly better performance scores for classifying visually similar malware families. The generalizability of the proposed model is benchmarked with the popular Microsoft Big 2015 dataset and has achieved comparatively higher performance scores than many existing models. This benchmarking demonstrates the robustness and scalability of our approach. The use of cost-sensitive learning and bootstrapping techniques also contributed to the model’s ability to generalize well to new and unseen data. These enhancements ensure that our model can be effectively applied in diverse real-world scenarios, maintaining high performance across different environments and malware types. This research can contribute to detecting malware attacks and can be integrated in threat monitoring systems. The successful application of this hybrid model indicates its potential for deployment in real-world cybersecurity environments, providing a strong defense against evolving malware threats.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research

CiteScore

11.10

自引率

1.90%

发文量

审稿时长

110 days

期刊介绍： The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.