A Wolf in Sheep’s Clothing: Query-Free Evasion Attacks Against Machine Learning-Based Malware Detectors with Generative Adversarial Networks

2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) Pub Date : 2023-07-01 DOI:10.1109/EuroSPW59978.2023.00052

Daniel Gibert, Jordi Planes, Quan Le, Giulio Zizzo

{"title":"A Wolf in Sheep’s Clothing: Query-Free Evasion Attacks Against Machine Learning-Based Malware Detectors with Generative Adversarial Networks","authors":"Daniel Gibert, Jordi Planes, Quan Le, Giulio Zizzo","doi":"10.1109/EuroSPW59978.2023.00052","DOIUrl":null,"url":null,"abstract":"Malware detectors based on machine learning (ML) have been shown to be susceptible to adversarial malware examples. However, current methods to generate adversarial malware examples still have their limits. They either rely on detailed model information (gradient-based attacks), or on detailed outputs of the model - such as class probabilities (score-based attacks), neither of which are available in real-world scenarios. Alternatively, adversarial examples might be crafted using only the label assigned by the detector (label-based attack) to train a substitute network or an agent using reinforcement learning. Nonetheless, label-based attacks might require querying a black-box system from a small number to thousands of times, depending on the approach, which might not be feasible against malware detectors.This work presents a novel query-free approach to craft adversarial malware examples to evade ML-based malware detectors. To this end, we have devised a GAN-based framework to generate adversarial malware examples that look similar to benign executables in the feature space. To demonstrate the suitability of our approach we have applied the GAN-based attack to three common types of features usually employed by static ML-based malware detectors: (1) Byte histogram features, (2) API-based features, and (3) String-based features. Results show that our model-agnostic approach performs on par with MalGAN, while generating more realistic adversarial malware examples without requiring any query to the malware detectors. Furthermore, we have tested the generated adversarial examples against state-of-the-art multimodal and deep learning malware detectors, showing a decrease in detection performance, as well as a decrease in the average number of detections by the antimalware engines in VirusTotal.","PeriodicalId":220415,"journal":{"name":"2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EuroSPW59978.2023.00052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Malware detectors based on machine learning (ML) have been shown to be susceptible to adversarial malware examples. However, current methods to generate adversarial malware examples still have their limits. They either rely on detailed model information (gradient-based attacks), or on detailed outputs of the model - such as class probabilities (score-based attacks), neither of which are available in real-world scenarios. Alternatively, adversarial examples might be crafted using only the label assigned by the detector (label-based attack) to train a substitute network or an agent using reinforcement learning. Nonetheless, label-based attacks might require querying a black-box system from a small number to thousands of times, depending on the approach, which might not be feasible against malware detectors.This work presents a novel query-free approach to craft adversarial malware examples to evade ML-based malware detectors. To this end, we have devised a GAN-based framework to generate adversarial malware examples that look similar to benign executables in the feature space. To demonstrate the suitability of our approach we have applied the GAN-based attack to three common types of features usually employed by static ML-based malware detectors: (1) Byte histogram features, (2) API-based features, and (3) String-based features. Results show that our model-agnostic approach performs on par with MalGAN, while generating more realistic adversarial malware examples without requiring any query to the malware detectors. Furthermore, we have tested the generated adversarial examples against state-of-the-art multimodal and deep learning malware detectors, showing a decrease in detection performance, as well as a decrease in the average number of detections by the antimalware engines in VirusTotal.

查看原文本刊更多论文

披着羊皮的狼:基于生成对抗网络的基于机器学习的恶意软件检测器的无查询逃避攻击

基于机器学习(ML)的恶意软件检测器已被证明容易受到对抗性恶意软件示例的影响。然而，目前生成对抗性恶意软件示例的方法仍然有其局限性。它们要么依赖于详细的模型信息(基于梯度的攻击)，要么依赖于模型的详细输出——比如类概率(基于分数的攻击)，这两者在现实场景中都不可用。或者，对抗性示例可能只使用检测器分配的标签(基于标签的攻击)来训练使用强化学习的替代网络或代理。尽管如此，基于标签的攻击可能需要对黑箱系统进行从少量到数千次的查询，这取决于使用的方法，这对于恶意软件检测器来说可能不可行。这项工作提出了一种新的无查询方法来制作对抗性恶意软件示例，以逃避基于ml的恶意软件检测器。为此，我们设计了一个基于gan的框架来生成对抗性恶意软件示例，这些示例在特征空间中看起来与良性可执行文件相似。为了证明我们的方法的适用性，我们将基于gan的攻击应用于静态基于ml的恶意软件检测器通常使用的三种常见类型的特征:(1)字节直方图特征，(2)基于api的特征和(3)基于字符串的特征。结果表明，我们的模型不可知方法的性能与MalGAN相当，同时生成更真实的对抗性恶意软件示例，而无需对恶意软件检测器进行任何查询。此外，我们已经针对最先进的多模态和深度学习恶意软件检测器测试了生成的对抗性示例，显示检测性能下降，以及VirusTotal中反恶意软件引擎的平均检测次数减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

自引率

0.00%

发文量