Explainable Machine Learning For Malware Detection Using Ensemble Bagging Algorithms

Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing Pub Date : 2022-08-04 DOI:10.1145/3549206.3549284

Rajesh Kumar, Geetha Subbiah

{"title":"Explainable Machine Learning For Malware Detection Using Ensemble Bagging Algorithms","authors":"Rajesh Kumar, Geetha Subbiah","doi":"10.1145/3549206.3549284","DOIUrl":null,"url":null,"abstract":"Vulnerabilities in various software products can be used to attack the security systems in any organization anywhere. Malware is downloaded after a click on the hyperlink by the unsuspecting user and used as the exploitation tool for the vulnerabilities in systems for attacks. Detecting a large number of malware effectively can be possible by machine learning. However, Machine learning based systems have misclassification as false positives and false negatives. Novelty in this paper is to improve the efficiency and robustness of ensemble bagging algorithm Extra tree to detect malware effectively and robustly by explainable machine learning. The paper uses waterfall plots based on Shapley value to detect the trends in features for misclassification. The trends in the five topmost features for misclassification are used to make inductive rules. The inductive rules are applied to overcome misclassification and enhance the performance of bagging algorithms. The inductive rules can be applied to effectively detect unknown future malware known as zero-day malware preventing the attack on security systems. The accuracy for the Extra tree bagging algorithm is 98.1% for future unknown malware. Considering, that the misclassified samples are also detected by the inductive rules the accuracy is 100%. Heatmap based on Shapley value of features confirms the topmost features for all the misclassified samples in the dataset and strengthens the inductive rule.","PeriodicalId":199675,"journal":{"name":"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3549206.3549284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Vulnerabilities in various software products can be used to attack the security systems in any organization anywhere. Malware is downloaded after a click on the hyperlink by the unsuspecting user and used as the exploitation tool for the vulnerabilities in systems for attacks. Detecting a large number of malware effectively can be possible by machine learning. However, Machine learning based systems have misclassification as false positives and false negatives. Novelty in this paper is to improve the efficiency and robustness of ensemble bagging algorithm Extra tree to detect malware effectively and robustly by explainable machine learning. The paper uses waterfall plots based on Shapley value to detect the trends in features for misclassification. The trends in the five topmost features for misclassification are used to make inductive rules. The inductive rules are applied to overcome misclassification and enhance the performance of bagging algorithms. The inductive rules can be applied to effectively detect unknown future malware known as zero-day malware preventing the attack on security systems. The accuracy for the Extra tree bagging algorithm is 98.1% for future unknown malware. Considering, that the misclassified samples are also detected by the inductive rules the accuracy is 100%. Heatmap based on Shapley value of features confirms the topmost features for all the misclassified samples in the dataset and strengthens the inductive rule.

查看原文本刊更多论文

使用集成装袋算法的恶意软件检测的可解释机器学习

各种软件产品中的漏洞都可以被用来攻击任何组织、任何地方的安全系统。恶意软件被毫无戒心的用户在点击超链接后下载，并用作利用系统漏洞进行攻击的工具。通过机器学习可以有效地检测大量恶意软件。然而，基于机器学习的系统会被错误地分类为假阳性和假阴性。本文的新颖之处在于通过可解释的机器学习，提高集成装袋算法Extra tree的效率和鲁棒性，有效鲁棒地检测恶意软件。本文采用基于Shapley值的瀑布图来检测特征的错误分类趋势。利用5个最上面的错误分类特征的趋势来制定归纳规则。归纳规则用于克服误分类，提高bagging算法的性能。归纳规则可以有效地检测未知的未来恶意软件，即零日恶意软件，防止对安全系统的攻击。对于未来未知的恶意软件，Extra tree bagging算法的准确率为98.1%。考虑到误分类样本也被归纳规则检测出来，准确率为100%。基于特征Shapley值的热图确定了数据集中所有误分类样本的最顶层特征，增强了归纳规则。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing

自引率

0.00%

发文量