{"title":"FAMCF:几枪搞定的安卓恶意软件家族分类框架","authors":"","doi":"10.1016/j.cose.2024.104027","DOIUrl":null,"url":null,"abstract":"<div><p>Android malware is a major cyber threat to the popular Android platform which may influence millions of end users. To battle against Android malware, a large number of machine learning-based approaches have been developed, and have achieved promising results. However, the vast majority of the existing work relies on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to classify such <em>few-shot Android malware families</em>. In this paper, we propose FAMCF, a novel few-shot learning-based classification pipeline to solve the problem. Faced with insufficient labeled samples from few-shot malware families, we learn how to extract features by training on another base dataset which is of a much larger scale but has disjoint label space with the few-shot families. We consider three types of features based on static analysis, namely permissions, API calls, and opcodes. We train a classifier for each type of features, utilizing a metric-based few-shot learning approach, and get an ensemble decision. Specifically, for each classifier, given a query sample to be classified, we propose to compare it to the <em>prototypes</em> of all the families, which are generated in a <em>query-dependent</em> way. We compared the classification performance of FAMCF to that of the existing solutions of multiple categories, including those traditional machine learning-based approaches, few-shot Android malware classification approaches, and also state-of-the-art few-shot learning methods from other fields. We also analyzed robustness of FAMCF against multiple popular obfuscation techniques. The extensive experiments on the popular <em>Drebin</em> and <em>CICInvesAndMal2019</em> datasets confirm the effectiveness and robustness of FAMCF in classifying few-shot Android malware families, e.g., we achieve at least 4.86% improvement on classification accuracy for <em>Drebin</em> and successfully kept the decrease in accuracy within 1% under the seven common types of obfuscation techniques.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FAMCF: A few-shot Android malware family classification framework\",\"authors\":\"\",\"doi\":\"10.1016/j.cose.2024.104027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Android malware is a major cyber threat to the popular Android platform which may influence millions of end users. To battle against Android malware, a large number of machine learning-based approaches have been developed, and have achieved promising results. However, the vast majority of the existing work relies on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to classify such <em>few-shot Android malware families</em>. In this paper, we propose FAMCF, a novel few-shot learning-based classification pipeline to solve the problem. Faced with insufficient labeled samples from few-shot malware families, we learn how to extract features by training on another base dataset which is of a much larger scale but has disjoint label space with the few-shot families. We consider three types of features based on static analysis, namely permissions, API calls, and opcodes. We train a classifier for each type of features, utilizing a metric-based few-shot learning approach, and get an ensemble decision. Specifically, for each classifier, given a query sample to be classified, we propose to compare it to the <em>prototypes</em> of all the families, which are generated in a <em>query-dependent</em> way. We compared the classification performance of FAMCF to that of the existing solutions of multiple categories, including those traditional machine learning-based approaches, few-shot Android malware classification approaches, and also state-of-the-art few-shot learning methods from other fields. We also analyzed robustness of FAMCF against multiple popular obfuscation techniques. The extensive experiments on the popular <em>Drebin</em> and <em>CICInvesAndMal2019</em> datasets confirm the effectiveness and robustness of FAMCF in classifying few-shot Android malware families, e.g., we achieve at least 4.86% improvement on classification accuracy for <em>Drebin</em> and successfully kept the decrease in accuracy within 1% under the seven common types of obfuscation techniques.</p></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824003328\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824003328","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
FAMCF: A few-shot Android malware family classification framework
Android malware is a major cyber threat to the popular Android platform which may influence millions of end users. To battle against Android malware, a large number of machine learning-based approaches have been developed, and have achieved promising results. However, the vast majority of the existing work relies on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to classify such few-shot Android malware families. In this paper, we propose FAMCF, a novel few-shot learning-based classification pipeline to solve the problem. Faced with insufficient labeled samples from few-shot malware families, we learn how to extract features by training on another base dataset which is of a much larger scale but has disjoint label space with the few-shot families. We consider three types of features based on static analysis, namely permissions, API calls, and opcodes. We train a classifier for each type of features, utilizing a metric-based few-shot learning approach, and get an ensemble decision. Specifically, for each classifier, given a query sample to be classified, we propose to compare it to the prototypes of all the families, which are generated in a query-dependent way. We compared the classification performance of FAMCF to that of the existing solutions of multiple categories, including those traditional machine learning-based approaches, few-shot Android malware classification approaches, and also state-of-the-art few-shot learning methods from other fields. We also analyzed robustness of FAMCF against multiple popular obfuscation techniques. The extensive experiments on the popular Drebin and CICInvesAndMal2019 datasets confirm the effectiveness and robustness of FAMCF in classifying few-shot Android malware families, e.g., we achieve at least 4.86% improvement on classification accuracy for Drebin and successfully kept the decrease in accuracy within 1% under the seven common types of obfuscation techniques.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.