{"title":"基于树表示的恶意软件族分类","authors":"Yang Xu, Zhuotai Chen","doi":"10.1145/3609510.3609818","DOIUrl":null,"url":null,"abstract":"Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Family Classification based on Tree Representations for Malware\",\"authors\":\"Yang Xu, Zhuotai Chen\",\"doi\":\"10.1145/3609510.3609818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.\",\"PeriodicalId\":149629,\"journal\":{\"name\":\"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems\",\"volume\":\"129 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3609510.3609818\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609510.3609818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Family Classification based on Tree Representations for Malware
Malware classification is helpful for malware detection and analysis. Family classification of malware is a multi-classification task. Many studies have exploited API call sequences as malware features. However, API call sequences do not explicitly express the information about control structures between API calls, which may be useful to represent malware behavior features more accurately. In this paper, we propose a novel malware familial classification method. We model each malware as a Behavioral Tree from API call sequence obtained from dynamic analysis, which describes the control structure between the API calls. To reduce the computational complexity, we capture a set of binary relations, called as Heighted Behavior Relations, from the behavior tree as behavior features of malware. The TF-IDF technology is used to calculate the family behavior features from the behavior features of malware. Then the similarity vector of each malware is constructed based on the similarity between it and all the families. For family classification purpose, the similarity vectors of malware are fed into Naive Bayes algorithm to train a classifier. The experiments on dataset with 10620 malware samples from 43 malware families show that the classification accuracy of our approach is 10% higher than that of the classical methods based on API call sequences.