基于树表示的恶意软件族分类

Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems Pub Date : 2023-08-24 DOI:10.1145/3609510.3609818

Yang Xu, Zhuotai Chen

{"title":"基于树表示的恶意软件族分类","authors":"Yang Xu, Zhuotai Chen","doi":"10.1145/3609510.3609818","DOIUrl":null,"url":null,"abstract":"Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Family Classification based on Tree Representations for Malware\",\"authors\":\"Yang Xu, Zhuotai Chen\",\"doi\":\"10.1145/3609510.3609818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.\",\"PeriodicalId\":149629,\"journal\":{\"name\":\"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems\",\"volume\":\"129 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3609510.3609818\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609510.3609818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

恶意软件分类有助于恶意软件的检测和分析。恶意软件族分类是一项多分类任务。许多研究利用API调用序列作为恶意软件的特征。然而，API调用序列并没有显式地表达API调用之间的控制结构信息，这可能有助于更准确地表示恶意软件的行为特征。本文提出了一种新的恶意软件家族分类方法。我们将每个恶意软件建模为动态分析得到的API调用序列的行为树，它描述了API调用之间的控制结构。为了降低计算复杂度，我们从行为树中捕获一组称为高度行为关系的二进制关系作为恶意软件的行为特征。利用TF-IDF技术从恶意软件的行为特征中计算出家族行为特征。然后根据每个恶意软件与所有家族的相似度构造其相似度向量。为了进行家族分类，将恶意软件的相似向量输入朴素贝叶斯算法训练分类器。在43个恶意软件家族的10620个样本数据集上进行的实验表明，该方法的分类准确率比基于API调用序列的经典方法提高了10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Family Classification based on Tree Representations for Malware

Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems

自引率

0.00%

发文量