Mona Elattar, Ahmed Younes, Ibrahim Gad, Islam Elkabani
{"title":"Explainable AI model for PDFMal detection based on gradient boosting model","authors":"Mona Elattar, Ahmed Younes, Ibrahim Gad, Islam Elkabani","doi":"10.1007/s00521-024-10314-y","DOIUrl":null,"url":null,"abstract":"<p>Portable document formats (PDFs) are widely used for document exchange due to their widespread usage and versatility. However, PDFs are highly vulnerable to malware attacks, which pose significant security risks. Existing defense mechanisms often struggle to effectively detect and mitigate these threats, highlighting the need for more robust solutions. This paper introduces a robust framework that uses advanced tree-based ensemble models to detect malicious PDFs using the Evasive-PDFMal2022 dataset. The proposed model achieves a recall rate of 100%, an accuracy rate of 99.95%, and a fast inference time of 0.1723 s. Furthermore, the framework exhibits minimal false positive and false negative rates, ensuring a high level of precision in distinguishing between malicious and benign PDFs. Shapley additive explanations are used to improve the interpretability and reliability of the model’s predictions. The results highlight the effectiveness of the proposed model in improving PDF document security and addressing the challenges posed by malware attacks.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00521-024-10314-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Portable document formats (PDFs) are widely used for document exchange due to their widespread usage and versatility. However, PDFs are highly vulnerable to malware attacks, which pose significant security risks. Existing defense mechanisms often struggle to effectively detect and mitigate these threats, highlighting the need for more robust solutions. This paper introduces a robust framework that uses advanced tree-based ensemble models to detect malicious PDFs using the Evasive-PDFMal2022 dataset. The proposed model achieves a recall rate of 100%, an accuracy rate of 99.95%, and a fast inference time of 0.1723 s. Furthermore, the framework exhibits minimal false positive and false negative rates, ensuring a high level of precision in distinguishing between malicious and benign PDFs. Shapley additive explanations are used to improve the interpretability and reliability of the model’s predictions. The results highlight the effectiveness of the proposed model in improving PDF document security and addressing the challenges posed by malware attacks.
便携式文档格式(PDF)因其广泛的用途和多功能性而被广泛用于文档交换。然而,PDF 极易受到恶意软件的攻击,从而带来巨大的安全风险。现有的防御机制往往难以有效地检测和缓解这些威胁,因此需要更强大的解决方案。本文介绍了一种稳健的框架,该框架使用先进的基于树的集合模型,利用 Evasive-PDFMal2022 数据集检测恶意 PDF。此外,该框架的假阳性和假阴性率极低,确保了区分恶意 PDF 和良性 PDF 的高精确度。沙普利加法解释用于提高模型预测的可解释性和可靠性。结果凸显了所提模型在提高 PDF 文档安全性和应对恶意软件攻击带来的挑战方面的有效性。