Beyond the sandbox: Leveraging symbolic execution for evasive malware classification

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2024-11-13 DOI:10.1016/j.cose.2024.104193

Vasilis Vouvoutsis , Fran Casino , Constantinos Patsakis

{"title":"Beyond the sandbox: Leveraging symbolic execution for evasive malware classification","authors":"Vasilis Vouvoutsis , Fran Casino , Constantinos Patsakis","doi":"10.1016/j.cose.2024.104193","DOIUrl":null,"url":null,"abstract":"<div><div>Threat actors continuously update their code to incorporate counter-analysis techniques designed to evade detection and hinder the blocking of their malware. The first line of defence for malware authors is often to bypass static analysis, a relatively straightforward task using readily available tools such as packers and cryptors. To address this shortcoming, defenders send potential malware samples for execution in a sandbox environment. While sandboxing can provide valuable insights into the behaviour of software on an information system, advanced techniques like anti-virtualisation and hooking evasion allow malware to escape detection. The primary objective of this work is to complement sandbox execution with symbolic execution frameworks to detect new malware strains efficiently. Symbolic execution offers a distinct advantage over sandboxing by achieving greater coverage of all possible execution traces, as it can explore every potential execution path, regardless of the evasion methods employed by the malware authors. By carefully selecting the samples to be analysed, we can significantly reduce the workload while extracting essential dynamic features in a fraction of the time and with far fewer computational resources compared to sandboxing. To this end, we leverage machine learning in an automated pipeline, enabling the accurate detection of sophisticated malware using a real-world dataset. Our approach yields average F1 scores of 0.93 for the benign class and 0.99 for the malware class in a binary classification setup, surpassing the detection rates reported in the literature. Additionally, our method outperforms a commercial malware sandbox when applied to the same dataset, further highlighting the efficacy of the proposed method.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"149 ","pages":"Article 104193"},"PeriodicalIF":4.8000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016740482400498X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Threat actors continuously update their code to incorporate counter-analysis techniques designed to evade detection and hinder the blocking of their malware. The first line of defence for malware authors is often to bypass static analysis, a relatively straightforward task using readily available tools such as packers and cryptors. To address this shortcoming, defenders send potential malware samples for execution in a sandbox environment. While sandboxing can provide valuable insights into the behaviour of software on an information system, advanced techniques like anti-virtualisation and hooking evasion allow malware to escape detection. The primary objective of this work is to complement sandbox execution with symbolic execution frameworks to detect new malware strains efficiently. Symbolic execution offers a distinct advantage over sandboxing by achieving greater coverage of all possible execution traces, as it can explore every potential execution path, regardless of the evasion methods employed by the malware authors. By carefully selecting the samples to be analysed, we can significantly reduce the workload while extracting essential dynamic features in a fraction of the time and with far fewer computational resources compared to sandboxing. To this end, we leverage machine learning in an automated pipeline, enabling the accurate detection of sophisticated malware using a real-world dataset. Our approach yields average F1 scores of 0.93 for the benign class and 0.99 for the malware class in a binary classification setup, surpassing the detection rates reported in the literature. Additionally, our method outperforms a commercial malware sandbox when applied to the same dataset, further highlighting the efficacy of the proposed method.

查看原文本刊更多论文

超越沙盒：利用符号执行进行逃避式恶意软件分类

威胁者不断更新代码，加入反分析技术，以躲避检测，阻碍恶意软件的拦截。恶意软件作者的第一道防线往往是绕过静态分析，这是一项相对简单的任务，只需使用包装器和密码器等现成的工具即可。为了弥补这一缺陷，防御者会将潜在的恶意软件样本发送到沙盒环境中执行。虽然沙箱可以为信息系统上的软件行为提供有价值的洞察，但反虚拟化和挂钩规避等先进技术却能让恶意软件逃脱检测。这项工作的主要目标是利用符号执行框架对沙箱执行进行补充，从而有效地检测新的恶意软件。与沙箱执行相比，符号执行具有明显的优势，它能更大程度地覆盖所有可能的执行轨迹，因为它可以探索每一种潜在的执行路径，而不管恶意软件作者采用何种规避方法。通过精心选择要分析的样本，我们可以大大减少工作量，同时在提取基本动态特征时只需花费沙箱分析的一小部分时间和更少的计算资源。为此，我们在自动化管道中利用机器学习，使用真实世界的数据集准确检测复杂的恶意软件。在二元分类设置中，我们的方法对良性类的平均 F1 分数为 0.93，对恶意软件类的平均 F1 分数为 0.99，超过了文献报道的检测率。此外，当应用于相同数据集时，我们的方法还优于商业恶意软件沙盒，进一步凸显了所提方法的功效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.