Explainable AI model for PDFMal detection based on gradient boosting model

Mona Elattar, Ahmed Younes, Ibrahim Gad, Islam Elkabani
{"title":"Explainable AI model for PDFMal detection based on gradient boosting model","authors":"Mona Elattar, Ahmed Younes, Ibrahim Gad, Islam Elkabani","doi":"10.1007/s00521-024-10314-y","DOIUrl":null,"url":null,"abstract":"<p>Portable document formats (PDFs) are widely used for document exchange due to their widespread usage and versatility. However, PDFs are highly vulnerable to malware attacks, which pose significant security risks. Existing defense mechanisms often struggle to effectively detect and mitigate these threats, highlighting the need for more robust solutions. This paper introduces a robust framework that uses advanced tree-based ensemble models to detect malicious PDFs using the Evasive-PDFMal2022 dataset. The proposed model achieves a recall rate of 100%, an accuracy rate of 99.95%, and a fast inference time of 0.1723 s. Furthermore, the framework exhibits minimal false positive and false negative rates, ensuring a high level of precision in distinguishing between malicious and benign PDFs. Shapley additive explanations are used to improve the interpretability and reliability of the model’s predictions. The results highlight the effectiveness of the proposed model in improving PDF document security and addressing the challenges posed by malware attacks.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00521-024-10314-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Portable document formats (PDFs) are widely used for document exchange due to their widespread usage and versatility. However, PDFs are highly vulnerable to malware attacks, which pose significant security risks. Existing defense mechanisms often struggle to effectively detect and mitigate these threats, highlighting the need for more robust solutions. This paper introduces a robust framework that uses advanced tree-based ensemble models to detect malicious PDFs using the Evasive-PDFMal2022 dataset. The proposed model achieves a recall rate of 100%, an accuracy rate of 99.95%, and a fast inference time of 0.1723 s. Furthermore, the framework exhibits minimal false positive and false negative rates, ensuring a high level of precision in distinguishing between malicious and benign PDFs. Shapley additive explanations are used to improve the interpretability and reliability of the model’s predictions. The results highlight the effectiveness of the proposed model in improving PDF document security and addressing the challenges posed by malware attacks.

Abstract Image

基于梯度提升模型的用于 PDFMal 检测的可解释人工智能模型
便携式文档格式(PDF)因其广泛的用途和多功能性而被广泛用于文档交换。然而,PDF 极易受到恶意软件的攻击,从而带来巨大的安全风险。现有的防御机制往往难以有效地检测和缓解这些威胁,因此需要更强大的解决方案。本文介绍了一种稳健的框架,该框架使用先进的基于树的集合模型,利用 Evasive-PDFMal2022 数据集检测恶意 PDF。此外,该框架的假阳性和假阴性率极低,确保了区分恶意 PDF 和良性 PDF 的高精确度。沙普利加法解释用于提高模型预测的可解释性和可靠性。结果凸显了所提模型在提高 PDF 文档安全性和应对恶意软件攻击带来的挑战方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信