FAMCF:几枪搞定的安卓恶意软件家族分类框架

IF 4.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
{"title":"FAMCF:几枪搞定的安卓恶意软件家族分类框架","authors":"","doi":"10.1016/j.cose.2024.104027","DOIUrl":null,"url":null,"abstract":"<div><p>Android malware is a major cyber threat to the popular Android platform which may influence millions of end users. To battle against Android malware, a large number of machine learning-based approaches have been developed, and have achieved promising results. However, the vast majority of the existing work relies on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to classify such <em>few-shot Android malware families</em>. In this paper, we propose FAMCF, a novel few-shot learning-based classification pipeline to solve the problem. Faced with insufficient labeled samples from few-shot malware families, we learn how to extract features by training on another base dataset which is of a much larger scale but has disjoint label space with the few-shot families. We consider three types of features based on static analysis, namely permissions, API calls, and opcodes. We train a classifier for each type of features, utilizing a metric-based few-shot learning approach, and get an ensemble decision. Specifically, for each classifier, given a query sample to be classified, we propose to compare it to the <em>prototypes</em> of all the families, which are generated in a <em>query-dependent</em> way. We compared the classification performance of FAMCF to that of the existing solutions of multiple categories, including those traditional machine learning-based approaches, few-shot Android malware classification approaches, and also state-of-the-art few-shot learning methods from other fields. We also analyzed robustness of FAMCF against multiple popular obfuscation techniques. The extensive experiments on the popular <em>Drebin</em> and <em>CICInvesAndMal2019</em> datasets confirm the effectiveness and robustness of FAMCF in classifying few-shot Android malware families, e.g., we achieve at least 4.86% improvement on classification accuracy for <em>Drebin</em> and successfully kept the decrease in accuracy within 1% under the seven common types of obfuscation techniques.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FAMCF: A few-shot Android malware family classification framework\",\"authors\":\"\",\"doi\":\"10.1016/j.cose.2024.104027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Android malware is a major cyber threat to the popular Android platform which may influence millions of end users. To battle against Android malware, a large number of machine learning-based approaches have been developed, and have achieved promising results. However, the vast majority of the existing work relies on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to classify such <em>few-shot Android malware families</em>. In this paper, we propose FAMCF, a novel few-shot learning-based classification pipeline to solve the problem. Faced with insufficient labeled samples from few-shot malware families, we learn how to extract features by training on another base dataset which is of a much larger scale but has disjoint label space with the few-shot families. We consider three types of features based on static analysis, namely permissions, API calls, and opcodes. We train a classifier for each type of features, utilizing a metric-based few-shot learning approach, and get an ensemble decision. Specifically, for each classifier, given a query sample to be classified, we propose to compare it to the <em>prototypes</em> of all the families, which are generated in a <em>query-dependent</em> way. We compared the classification performance of FAMCF to that of the existing solutions of multiple categories, including those traditional machine learning-based approaches, few-shot Android malware classification approaches, and also state-of-the-art few-shot learning methods from other fields. We also analyzed robustness of FAMCF against multiple popular obfuscation techniques. The extensive experiments on the popular <em>Drebin</em> and <em>CICInvesAndMal2019</em> datasets confirm the effectiveness and robustness of FAMCF in classifying few-shot Android malware families, e.g., we achieve at least 4.86% improvement on classification accuracy for <em>Drebin</em> and successfully kept the decrease in accuracy within 1% under the seven common types of obfuscation techniques.</p></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824003328\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824003328","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

安卓恶意软件是流行的安卓平台面临的主要网络威胁,可能会影响数百万终端用户。为了对抗安卓恶意软件,人们开发了大量基于机器学习的方法,并取得了可喜的成果。然而,现有的绝大多数工作都依赖于大量的标注样本,而不幸的是,新报告的安卓恶意软件家族却没有标注样本。这就为对此类寥寥无几的安卓恶意软件家族进行分类提出了严峻的挑战。在本文中,我们提出了 FAMCF--一种新颖的基于少量学习的分类管道来解决这一问题。面对来自少数几个恶意软件家族的标注样本不足的问题,我们学习如何通过在另一个基础数据集上进行训练来提取特征,该数据集规模更大,但与少数几个恶意软件家族的标注空间不相交。我们在静态分析的基础上考虑了三种类型的特征,即权限、API 调用和操作码。我们利用基于度量的少次元学习方法,为每种类型的特征训练一个分类器,并得到一个集合决策。具体来说,对于每个分类器,在给定一个待分类的查询样本后,我们建议将其与所有族的原型进行比较,这些原型是以查询相关的方式生成的。我们将 FAMCF 的分类性能与现有的多类解决方案进行了比较,其中包括传统的基于机器学习的方法、少量 Android 恶意软件分类方法以及其他领域最先进的少量学习方法。我们还分析了 FAMCF 对多种流行混淆技术的鲁棒性。在流行的 Drebin 和 CICInvesAndMal2019 数据集上进行的大量实验证实了 FAMCF 在对少量安卓恶意软件家族进行分类时的有效性和鲁棒性,例如,我们在 Drebin 数据集上实现了至少 4.86% 的分类准确率提升,并成功地将七种常见混淆技术下的准确率下降控制在 1%以内。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FAMCF: A few-shot Android malware family classification framework

Android malware is a major cyber threat to the popular Android platform which may influence millions of end users. To battle against Android malware, a large number of machine learning-based approaches have been developed, and have achieved promising results. However, the vast majority of the existing work relies on a large number of labeled samples which are unfortunately not available for the newly reported Android malware families. This poses a critical challenge to classify such few-shot Android malware families. In this paper, we propose FAMCF, a novel few-shot learning-based classification pipeline to solve the problem. Faced with insufficient labeled samples from few-shot malware families, we learn how to extract features by training on another base dataset which is of a much larger scale but has disjoint label space with the few-shot families. We consider three types of features based on static analysis, namely permissions, API calls, and opcodes. We train a classifier for each type of features, utilizing a metric-based few-shot learning approach, and get an ensemble decision. Specifically, for each classifier, given a query sample to be classified, we propose to compare it to the prototypes of all the families, which are generated in a query-dependent way. We compared the classification performance of FAMCF to that of the existing solutions of multiple categories, including those traditional machine learning-based approaches, few-shot Android malware classification approaches, and also state-of-the-art few-shot learning methods from other fields. We also analyzed robustness of FAMCF against multiple popular obfuscation techniques. The extensive experiments on the popular Drebin and CICInvesAndMal2019 datasets confirm the effectiveness and robustness of FAMCF in classifying few-shot Android malware families, e.g., we achieve at least 4.86% improvement on classification accuracy for Drebin and successfully kept the decrease in accuracy within 1% under the seven common types of obfuscation techniques.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Security
Computers & Security 工程技术-计算机:信息系统
CiteScore
12.40
自引率
7.10%
发文量
365
审稿时长
10.7 months
期刊介绍: Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信