FAMCF：几枪搞定的安卓恶意软件家族分类框架

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2024-07-29 DOI:10.1016/j.cose.2024.104027

{"title":"FAMCF：几枪搞定的安卓恶意软件家族分类框架","authors":"","doi":"10.1016/j.cose.2024.104027","DOIUrl":null,"url":null,"abstract":"<div>Android malware is a major cyber threat to the popular Android platform which may influence millions of end users. To battle against Android malware, a large number of machine learning-based approaches have been developed, and have achieved promising results. However, the vast majority of the existing work relies on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to classify such few-shot Android malware families. In this paper, we propose FAMCF, a novel few-shot learning-based classification pipeline to solve the problem. Faced with insufficient labeled samples from few-shot malware families, we learn how to extract features by training on another base dataset which is of a much larger scale but has disjoint label space with the few-shot families. We consider three types of features based on static analysis, namely permissions, API calls, and opcodes. We train a classifier for each type of features, utilizing a metric-based few-shot learning approach, and get an ensemble decision. Specifically, for each classifier, given a query sample to be classified, we propose to compare it to the prototypes of all the families, which are generated in a query-dependent way. We compared the classification performance of FAMCF to that of the existing solutions of multiple categories, including those traditional machine learning-based approaches, few-shot Android malware classification approaches, and also state-of-the-art few-shot learning methods from other fields. We also analyzed robustness of FAMCF against multiple popular obfuscation techniques. The extensive experiments on the popular Drebin and CICInvesAndMal2019 datasets confirm the effectiveness and robustness of FAMCF in classifying few-shot Android malware families, e.g., we achieve at least 4.86% improvement on classification accuracy for Drebin and successfully kept the decrease in accuracy within 1% under the seven common types of obfuscation techniques.</div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FAMCF: A few-shot Android malware family classification framework\",\"authors\":\"\",\"doi\":\"10.1016/j.cose.2024.104027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>Android malware is a major cyber threat to the popular Android platform which may influence millions of end users. To battle against Android malware, a large number of machine learning-based approaches have been developed, and have achieved promising results. However, the vast majority of the existing work relies on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to classify such few-shot Android malware families. In this paper, we propose FAMCF, a novel few-shot learning-based classification pipeline to solve the problem. Faced with insufficient labeled samples from few-shot malware families, we learn how to extract features by training on another base dataset which is of a much larger scale but has disjoint label space with the few-shot families. We consider three types of features based on static analysis, namely permissions, API calls, and opcodes. We train a classifier for each type of features, utilizing a metric-based few-shot learning approach, and get an ensemble decision. Specifically, for each classifier, given a query sample to be classified, we propose to compare it to the prototypes of all the families, which are generated in a query-dependent way. We compared the classification performance of FAMCF to that of the existing solutions of multiple categories, including those traditional machine learning-based approaches, few-shot Android malware classification approaches, and also state-of-the-art few-shot learning methods from other fields. We also analyzed robustness of FAMCF against multiple popular obfuscation techniques. The extensive experiments on the popular Drebin and CICInvesAndMal2019 datasets confirm the effectiveness and robustness of FAMCF in classifying few-shot Android malware families, e.g., we achieve at least 4.86% improvement on classification accuracy for Drebin and successfully kept the decrease in accuracy within 1% under the seven common types of obfuscation techniques.</div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824003328\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824003328","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

安卓恶意软件是流行的安卓平台面临的主要网络威胁，可能会影响数百万终端用户。为了对抗安卓恶意软件，人们开发了大量基于机器学习的方法，并取得了可喜的成果。然而，现有的绝大多数工作都依赖于大量的标注样本，而不幸的是，新报告的安卓恶意软件家族却没有标注样本。这就为对此类寥寥无几的安卓恶意软件家族进行分类提出了严峻的挑战。在本文中，我们提出了 FAMCF--一种新颖的基于少量学习的分类管道来解决这一问题。面对来自少数几个恶意软件家族的标注样本不足的问题，我们学习如何通过在另一个基础数据集上进行训练来提取特征，该数据集规模更大，但与少数几个恶意软件家族的标注空间不相交。我们在静态分析的基础上考虑了三种类型的特征，即权限、API 调用和操作码。我们利用基于度量的少次元学习方法，为每种类型的特征训练一个分类器，并得到一个集合决策。具体来说，对于每个分类器，在给定一个待分类的查询样本后，我们建议将其与所有族的原型进行比较，这些原型是以查询相关的方式生成的。我们将 FAMCF 的分类性能与现有的多类解决方案进行了比较，其中包括传统的基于机器学习的方法、少量 Android 恶意软件分类方法以及其他领域最先进的少量学习方法。我们还分析了 FAMCF 对多种流行混淆技术的鲁棒性。在流行的 Drebin 和 CICInvesAndMal2019 数据集上进行的大量实验证实了 FAMCF 在对少量安卓恶意软件家族进行分类时的有效性和鲁棒性，例如，我们在 Drebin 数据集上实现了至少 4.86% 的分类准确率提升，并成功地将七种常见混淆技术下的准确率下降控制在 1%以内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FAMCF: A few-shot Android malware family classification framework

Android malware is a major cyber threat to the popular Android platform which may influence millions of end users. To battle against Android malware, a large number of machine learning-based approaches have been developed, and have achieved promising results. However, the vast majority of the existing work relies on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to classify such few-shot Android malware families. In this paper, we propose FAMCF, a novel few-shot learning-based classification pipeline to solve the problem. Faced with insufficient labeled samples from few-shot malware families, we learn how to extract features by training on another base dataset which is of a much larger scale but has disjoint label space with the few-shot families. We consider three types of features based on static analysis, namely permissions, API calls, and opcodes. We train a classifier for each type of features, utilizing a metric-based few-shot learning approach, and get an ensemble decision. Specifically, for each classifier, given a query sample to be classified, we propose to compare it to the prototypes of all the families, which are generated in a query-dependent way. We compared the classification performance of FAMCF to that of the existing solutions of multiple categories, including those traditional machine learning-based approaches, few-shot Android malware classification approaches, and also state-of-the-art few-shot learning methods from other fields. We also analyzed robustness of FAMCF against multiple popular obfuscation techniques. The extensive experiments on the popular Drebin and CICInvesAndMal2019 datasets confirm the effectiveness and robustness of FAMCF in classifying few-shot Android malware families, e.g., we achieve at least 4.86% improvement on classification accuracy for Drebin and successfully kept the decrease in accuracy within 1% under the seven common types of obfuscation techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.