FCGAT: Interpretable Malware Classification Method using Function Call Graph and Attention Mechanism

Proceedings 2023 Workshop on Binary Analysis Research Pub Date : 1900-01-01 DOI:10.14722/bar.2023.23005

Minami Someya, Yuhei Otsubo, Akira Otsuka

{"title":"FCGAT: Interpretable Malware Classification Method using Function Call Graph and Attention Mechanism","authors":"Minami Someya, Yuhei Otsubo, Akira Otsuka","doi":"10.14722/bar.2023.23005","DOIUrl":null,"url":null,"abstract":"—Malware classification facilitates static analysis, which is manually intensive but necessary work to understand the inner workings of unknown malware. Machine learning based approaches have been actively studied and have great potential. However, their drawback is that their models are considered black boxes and are challenging to explain their classification results and thus cannot provide patterns specific to malware. To address this problem, we propose FCGAT, the first malware classification method that provides interpretable classification reasons based on program functions. FCGAT applies natural language processing techniques to create function features and updates them to reflect the calling relationships between functions. Then, it applies attention mechanism to create malware feature by emphasizing the functions that are important for classification with attention weights. FCGAT provides an importance ranking of functions based on attention weights as an explanation. We evaluate the performance of FCGAT on two datasets. The results show that the F1-Scores are 98.15% and 98.18%, which are competitive with the cutting-edge methods. Furthermore, we examine how much the functions emphasized by FCGAT contribute to the classification. Surprisingly, our result show that only top 6 (average per sample) highly-weighted functions yield as much as 70% accuracy. We also show that these functions reflect the characteristics of malware by analyzing them. FCGAT can provide analysts with reliable explanations using a small number of functions. These explanations could bring various benefits, such as improved efficiency in malware analysis and comprehensive malware trend analysis.","PeriodicalId":411073,"journal":{"name":"Proceedings 2023 Workshop on Binary Analysis Research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2023 Workshop on Binary Analysis Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/bar.2023.23005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

—Malware classification facilitates static analysis, which is manually intensive but necessary work to understand the inner workings of unknown malware. Machine learning based approaches have been actively studied and have great potential. However, their drawback is that their models are considered black boxes and are challenging to explain their classification results and thus cannot provide patterns specific to malware. To address this problem, we propose FCGAT, the first malware classification method that provides interpretable classification reasons based on program functions. FCGAT applies natural language processing techniques to create function features and updates them to reflect the calling relationships between functions. Then, it applies attention mechanism to create malware feature by emphasizing the functions that are important for classification with attention weights. FCGAT provides an importance ranking of functions based on attention weights as an explanation. We evaluate the performance of FCGAT on two datasets. The results show that the F1-Scores are 98.15% and 98.18%, which are competitive with the cutting-edge methods. Furthermore, we examine how much the functions emphasized by FCGAT contribute to the classification. Surprisingly, our result show that only top 6 (average per sample) highly-weighted functions yield as much as 70% accuracy. We also show that these functions reflect the characteristics of malware by analyzing them. FCGAT can provide analysts with reliable explanations using a small number of functions. These explanations could bring various benefits, such as improved efficiency in malware analysis and comprehensive malware trend analysis.

查看原文本刊更多论文

基于函数调用图和注意机制的可解释恶意软件分类方法

-恶意软件分类有助于静态分析，这是手动密集但必要的工作，以了解未知恶意软件的内部工作。基于机器学习的方法已经得到了积极的研究，并且具有很大的潜力。然而，它们的缺点是它们的模型被认为是黑盒，很难解释它们的分类结果，因此不能提供特定于恶意软件的模式。为了解决这个问题，我们提出了FCGAT，这是第一个基于程序功能提供可解释分类原因的恶意软件分类方法。FCGAT应用自然语言处理技术来创建函数特征，并对其进行更新，以反映函数之间的调用关系。然后，应用注意机制，通过注意权重强调对分类重要的功能，创建恶意软件特征。FCGAT提供了一个基于注意权重的功能重要性排序作为解释。我们在两个数据集上评估了FCGAT的性能。结果表明，f1 - score分别为98.15%和98.18%，与前沿方法相比具有一定的竞争力。此外，我们还研究了FCGAT所强调的功能对分类的贡献。令人惊讶的是，我们的结果显示，只有前6名(平均每个样本)的高权重函数产生高达70%的准确率。通过对这些函数的分析，我们也证明了这些函数反映了恶意软件的特征。FCGAT可以使用少量函数为分析人员提供可靠的解释。这些解释可以带来各种好处，例如提高恶意软件分析的效率和全面的恶意软件趋势分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 2023 Workshop on Binary Analysis Research

自引率

0.00%

发文量