使用可解释的深度学习推进恶意软件图像分类：使用SHAP， LIME和Grad-CAM的最先进方法。

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-05-28 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0318542

Sadia Nazim, Muhammad Mansoor Alam, Syed Safdar Rizvi, Jawahir Che Mustapha, Syed Shujaa Hussain, Mazliham Mohd Suud

{"title":"使用可解释的深度学习推进恶意软件图像分类：使用SHAP， LIME和Grad-CAM的最先进方法。","authors":"Sadia Nazim, Muhammad Mansoor Alam, Syed Safdar Rizvi, Jawahir Che Mustapha, Syed Shujaa Hussain, Mazliham Mohd Suud","doi":"10.1371/journal.pone.0318542","DOIUrl":null,"url":null,"abstract":"Artificial Intelligence (AI) is being integrated into increasingly more domains of everyday activities. Whereas AI has countless benefits, its convoluted and sometimes vague internal operations can establish difficulties. Nowadays, AI is significantly employed for evaluations in cybersecurity that find it challenging to justify their proceedings; this absence of accountability is alarming. Additionally, over the last ten years, the fractional elevation in malware variants has directed scholars to utilize Machine Learning (ML) and Deep Learning (DL) approaches for detection. Although these methods yield exceptional accuracy, they are also difficult to understand. Thus, the advancement of interpretable and powerful AI models is indispensable to their reliability and trustworthiness. The trust of users in the models used for cybersecurity would be undermined by the ambiguous and indefinable nature of existing AI-based methods, specifically in light of the more complicated and diverse nature of cyberattacks in modern times. The present research addresses the comparative analysis of an ensemble deep neural network (DNNW) with different ensemble techniques like RUSBoost, Random Forest, Subspace, AdaBoost, and BagTree for the best prediction against imagery malware data. It determines the best-performing model, an ensemble DNNW, for which explainability is provided. There has been relatively little study on explainability, especially when dealing with malware imagery data, irrespective of the fact that DL/ML algorithms have revolutionized malware detection. Explainability techniques such as SHAP, LIME, and Grad-CAM approaches are employed to present a complete comprehension of feature significance and local or global predictive behavior of the model over various malware categories. A comprehensive investigation of significant characteristics and their impact on the decision-making process of the model and multiple query point visualizations are some of the contributions. This strategy promotes advanced transparency and trustworthy cybersecurity applications by improving the comprehension of malware detection techniques and integrating explainable AI observations with domain-specific knowledge.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 5","pages":"e0318542"},"PeriodicalIF":2.6000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12118971/pdf/","citationCount":"0","resultStr":"{\"title\":\"Advancing malware imagery classification with explainable deep learning: A state-of-the-art approach using SHAP, LIME and Grad-CAM.\",\"authors\":\"Sadia Nazim, Muhammad Mansoor Alam, Syed Safdar Rizvi, Jawahir Che Mustapha, Syed Shujaa Hussain, Mazliham Mohd Suud\",\"doi\":\"10.1371/journal.pone.0318542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial Intelligence (AI) is being integrated into increasingly more domains of everyday activities. Whereas AI has countless benefits, its convoluted and sometimes vague internal operations can establish difficulties. Nowadays, AI is significantly employed for evaluations in cybersecurity that find it challenging to justify their proceedings; this absence of accountability is alarming. Additionally, over the last ten years, the fractional elevation in malware variants has directed scholars to utilize Machine Learning (ML) and Deep Learning (DL) approaches for detection. Although these methods yield exceptional accuracy, they are also difficult to understand. Thus, the advancement of interpretable and powerful AI models is indispensable to their reliability and trustworthiness. The trust of users in the models used for cybersecurity would be undermined by the ambiguous and indefinable nature of existing AI-based methods, specifically in light of the more complicated and diverse nature of cyberattacks in modern times. The present research addresses the comparative analysis of an ensemble deep neural network (DNNW) with different ensemble techniques like RUSBoost, Random Forest, Subspace, AdaBoost, and BagTree for the best prediction against imagery malware data. It determines the best-performing model, an ensemble DNNW, for which explainability is provided. There has been relatively little study on explainability, especially when dealing with malware imagery data, irrespective of the fact that DL/ML algorithms have revolutionized malware detection. Explainability techniques such as SHAP, LIME, and Grad-CAM approaches are employed to present a complete comprehension of feature significance and local or global predictive behavior of the model over various malware categories. A comprehensive investigation of significant characteristics and their impact on the decision-making process of the model and multiple query point visualizations are some of the contributions. This strategy promotes advanced transparency and trustworthy cybersecurity applications by improving the comprehension of malware detection techniques and integrating explainable AI observations with domain-specific knowledge.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 5\",\"pages\":\"e0318542\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12118971/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0318542\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0318542","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

人工智能（AI）正在融入越来越多的日常活动领域。尽管人工智能有无数的好处，但其复杂且有时模糊的内部操作可能会带来困难。如今，人工智能被大量用于网络安全评估，这些评估很难证明其程序的合理性；这种问责制的缺失令人担忧。此外，在过去十年中，恶意软件变体的少量上升促使学者们利用机器学习（ML）和深度学习（DL）方法进行检测。虽然这些方法产生了非凡的准确性，但它们也很难理解。因此，可解释和强大的人工智能模型的进步对于它们的可靠性和可信度是必不可少的。用户对用于网络安全的模型的信任将被现有的基于人工智能的方法的模糊和不可定义的性质所破坏，特别是考虑到现代网络攻击的复杂性和多样性。本研究将集成深度神经网络（DNNW）与不同的集成技术（如RUSBoost、Random Forest、Subspace、AdaBoost和BagTree）进行比较分析，以获得对图像恶意软件数据的最佳预测。它决定了表现最好的模型，一个集成DNNW，它提供了可解释性。关于可解释性的研究相对较少，特别是在处理恶意软件图像数据时，尽管DL/ML算法已经彻底改变了恶意软件检测。可解释性技术，如SHAP、LIME和Grad-CAM方法，用于对各种恶意软件类别的模型的特征意义和局部或全局预测行为进行完整的理解。对重要特征及其对模型决策过程的影响的全面研究和多查询点可视化是其中的一些贡献。该策略通过提高对恶意软件检测技术的理解，并将可解释的人工智能观察与领域特定知识相结合，促进了高级透明度和可信赖的网络安全应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Advancing malware imagery classification with explainable deep learning: A state-of-the-art approach using SHAP, LIME and Grad-CAM.

Artificial Intelligence (AI) is being integrated into increasingly more domains of everyday activities. Whereas AI has countless benefits, its convoluted and sometimes vague internal operations can establish difficulties. Nowadays, AI is significantly employed for evaluations in cybersecurity that find it challenging to justify their proceedings; this absence of accountability is alarming. Additionally, over the last ten years, the fractional elevation in malware variants has directed scholars to utilize Machine Learning (ML) and Deep Learning (DL) approaches for detection. Although these methods yield exceptional accuracy, they are also difficult to understand. Thus, the advancement of interpretable and powerful AI models is indispensable to their reliability and trustworthiness. The trust of users in the models used for cybersecurity would be undermined by the ambiguous and indefinable nature of existing AI-based methods, specifically in light of the more complicated and diverse nature of cyberattacks in modern times. The present research addresses the comparative analysis of an ensemble deep neural network (DNNW) with different ensemble techniques like RUSBoost, Random Forest, Subspace, AdaBoost, and BagTree for the best prediction against imagery malware data. It determines the best-performing model, an ensemble DNNW, for which explainability is provided. There has been relatively little study on explainability, especially when dealing with malware imagery data, irrespective of the fact that DL/ML algorithms have revolutionized malware detection. Explainability techniques such as SHAP, LIME, and Grad-CAM approaches are employed to present a complete comprehension of feature significance and local or global predictive behavior of the model over various malware categories. A comprehensive investigation of significant characteristics and their impact on the decision-making process of the model and multiple query point visualizations are some of the contributions. This strategy promotes advanced transparency and trustworthy cybersecurity applications by improving the comprehension of malware detection techniques and integrating explainable AI observations with domain-specific knowledge.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage