GMADV：安卓恶意软件变体生成与分类对抗训练框架

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Security and Applications Pub Date : 2024-06-07 DOI:10.1016/j.jisa.2024.103800

Shuangcheng Li , Zhangguo Tang , Huanzhou Li , Jian Zhang , Han Wang , Junfeng Wang

{"title":"GMADV：安卓恶意软件变体生成与分类对抗训练框架","authors":"Shuangcheng Li , Zhangguo Tang , Huanzhou Li , Jian Zhang , Han Wang , Junfeng Wang","doi":"10.1016/j.jisa.2024.103800","DOIUrl":null,"url":null,"abstract":"<div><p>Android malware uses anti-reverse analysis and APK shelling technology, which leads to the failure of the classification method based on decompiled features and the reduction of the classification accuracy based on single file features. Moreover, the lack of samples in some families of Android malware makes the classification model based on sample learning ineffective. To solve the above problems, this paper proposes a two-layer general framework for Android malware classification and adversarial training named GMADV, which enhances classifier performance through adversarial training. In the sample classification layer, based on the transformation method of the Markov model, it is proposed for the first time to convert the three files in the APK into RGB Markov images, and use VGG13 to automatically extract features and classification; In the variant amplification layer, the idea of \"regression for generation\" is firstly proposed, and GMM-GAN based on Gaussian process is designed to amplify the diversity of samples within the family. The experimental results show that RGB Markov images have better classification performance than grayscale images. On the three datasets, the classification effect after amplification has been improved to varying degrees, and all F1_Score reaches 95 %. Compared with other methods, GMADV has stronger family sample amplification ability and greater adversarial intensity.</p></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"84 ","pages":"Article 103800"},"PeriodicalIF":3.8000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GMADV: An android malware variant generation and classification adversarial training framework\",\"authors\":\"Shuangcheng Li , Zhangguo Tang , Huanzhou Li , Jian Zhang , Han Wang , Junfeng Wang\",\"doi\":\"10.1016/j.jisa.2024.103800\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Android malware uses anti-reverse analysis and APK shelling technology, which leads to the failure of the classification method based on decompiled features and the reduction of the classification accuracy based on single file features. Moreover, the lack of samples in some families of Android malware makes the classification model based on sample learning ineffective. To solve the above problems, this paper proposes a two-layer general framework for Android malware classification and adversarial training named GMADV, which enhances classifier performance through adversarial training. In the sample classification layer, based on the transformation method of the Markov model, it is proposed for the first time to convert the three files in the APK into RGB Markov images, and use VGG13 to automatically extract features and classification; In the variant amplification layer, the idea of \\\"regression for generation\\\" is firstly proposed, and GMM-GAN based on Gaussian process is designed to amplify the diversity of samples within the family. The experimental results show that RGB Markov images have better classification performance than grayscale images. On the three datasets, the classification effect after amplification has been improved to varying degrees, and all F1_Score reaches 95 %. Compared with other methods, GMADV has stronger family sample amplification ability and greater adversarial intensity.</p></div>\",\"PeriodicalId\":48638,\"journal\":{\"name\":\"Journal of Information Security and Applications\",\"volume\":\"84 \",\"pages\":\"Article 103800\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Security and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214212624001030\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212624001030","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

安卓恶意软件采用反逆向分析和 APK 加壳技术，导致基于反编译特征的分类方法失效，基于单个文件特征的分类精度降低。此外，某些 Android 恶意软件家族缺乏样本，导致基于样本学习的分类模型失效。为解决上述问题，本文提出了一种用于安卓恶意软件分类和对抗训练的双层通用框架，命名为 GMADV，通过对抗训练提高分类器性能。在样本分类层，基于马尔可夫模型的变换方法，首次提出将APK中的三个文件转换为RGB马尔可夫图像，并利用VGG13自动提取特征并进行分类；在变体放大层，首次提出 "回归生成 "的思想，设计了基于高斯过程的GMM-GAN来放大族内样本的多样性。实验结果表明，RGB 马尔科夫图像的分类性能优于灰度图像。在三个数据集上，放大后的分类效果均有不同程度的提高，F1_Score 均达到 95%。与其他方法相比，GMADV 具有更强的族样本放大能力和更大的对抗强度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GMADV: An android malware variant generation and classification adversarial training framework

Android malware uses anti-reverse analysis and APK shelling technology, which leads to the failure of the classification method based on decompiled features and the reduction of the classification accuracy based on single file features. Moreover, the lack of samples in some families of Android malware makes the classification model based on sample learning ineffective. To solve the above problems, this paper proposes a two-layer general framework for Android malware classification and adversarial training named GMADV, which enhances classifier performance through adversarial training. In the sample classification layer, based on the transformation method of the Markov model, it is proposed for the first time to convert the three files in the APK into RGB Markov images, and use VGG13 to automatically extract features and classification; In the variant amplification layer, the idea of "regression for generation" is firstly proposed, and GMM-GAN based on Gaussian process is designed to amplify the diversity of samples within the family. The experimental results show that RGB Markov images have better classification performance than grayscale images. On the three datasets, the classification effect after amplification has been improved to varying degrees, and all F1_Score reaches 95 %. Compared with other methods, GMADV has stronger family sample amplification ability and greater adversarial intensity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Information Security and Applications Computer Science-Computer Networks and Communications

CiteScore

10.90

自引率

5.40%

发文量

206

审稿时长

56 days

期刊介绍： Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.