Efficient Malware Analysis Using Metric Embeddings

Digital Threats: Research and Practice Pub Date : 2022-12-05 DOI:10.1145/3615669

Ethan M. Rudd, David B. Krisiloff, Scott E. Coull, Daniel Olszewski, Edward Raff, James Holt

{"title":"Efficient Malware Analysis Using Metric Embeddings","authors":"Ethan M. Rudd, David B. Krisiloff, Scott E. Coull, Daniel Olszewski, Edward Raff, James Holt","doi":"10.1145/3615669","DOIUrl":null,"url":null,"abstract":"Real-world malware analysis consists of a complex pipeline of classifiers and data analysis – from detection to classification of capabilities to retrieval of unique training samples from user systems. In this paper, we aim to reduce the complexity of these pipelines through the use of low-dimensional metric embeddings of Windows PE files, which can be used in a variety of downstream applications, including malware detection, family classification, and malware attribute tagging. Specifically, we enrich labeling of malicious and benign PE files with computationally-expensive, disassembly-based malicious capabilities information. Using this enhanced labeling, we derive several different types of efficient metric embeddings utilizing an embedding neural network trained via contrastive loss, Spearman rank correlation, and combinations thereof. Our evaluation examines performance on a variety of transfer tasks performed on the EMBER and SOREL datasets, demonstrating that low-dimensional, computationally-efficient metric embeddings maintain performance with little decay. This offers the potential to quickly retrain for a variety of transfer tasks at significantly reduced overhead and complexity. We conclude with an examination of practical considerations for the use of our proposed embedding approach, such as robustness to adversarial evasion and introduction of task-specific auxiliary objectives to improve performance on mission critical tasks.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Threats: Research and Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3615669","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Real-world malware analysis consists of a complex pipeline of classifiers and data analysis – from detection to classification of capabilities to retrieval of unique training samples from user systems. In this paper, we aim to reduce the complexity of these pipelines through the use of low-dimensional metric embeddings of Windows PE files, which can be used in a variety of downstream applications, including malware detection, family classification, and malware attribute tagging. Specifically, we enrich labeling of malicious and benign PE files with computationally-expensive, disassembly-based malicious capabilities information. Using this enhanced labeling, we derive several different types of efficient metric embeddings utilizing an embedding neural network trained via contrastive loss, Spearman rank correlation, and combinations thereof. Our evaluation examines performance on a variety of transfer tasks performed on the EMBER and SOREL datasets, demonstrating that low-dimensional, computationally-efficient metric embeddings maintain performance with little decay. This offers the potential to quickly retrain for a variety of transfer tasks at significantly reduced overhead and complexity. We conclude with an examination of practical considerations for the use of our proposed embedding approach, such as robustness to adversarial evasion and introduction of task-specific auxiliary objectives to improve performance on mission critical tasks.

查看原文本刊更多论文

使用度量嵌入的有效恶意软件分析

现实世界的恶意软件分析包括一个复杂的分类器和数据分析管道——从检测到分类能力，再到从用户系统中检索独特的训练样本。在本文中，我们的目标是通过使用Windows PE文件的低维度量嵌入来降低这些管道的复杂性，这可以用于各种下游应用程序，包括恶意软件检测，家族分类和恶意软件属性标记。具体地说，我们用计算昂贵的、基于反汇编的恶意能力信息来丰富恶意和良性PE文件的标记。使用这种增强的标记，我们利用通过对比损失、Spearman秩相关及其组合训练的嵌入神经网络，推导出几种不同类型的有效度量嵌入。我们的评估检查了在EMBER和SOREL数据集上执行的各种传输任务的性能，证明了低维、计算效率高的度量嵌入保持了几乎没有衰减的性能。这提供了在显著降低开销和复杂性的情况下快速重新训练各种传输任务的潜力。最后，我们对使用我们提出的嵌入方法的实际考虑进行了检查，例如对抗性规避的鲁棒性和引入特定于任务的辅助目标以提高关键任务的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Threats: Research and Practice

自引率

0.00%

发文量