Android Malware Family Classification and Characterization Using CFG and DFG

2019 International Symposium on Theoretical Aspects of Software Engineering (TASE) Pub Date : 2019-07-01 DOI:10.1109/TASE.2019.00-20

Zhiwu Xu, Kerong Ren, Fu Song

{"title":"Android Malware Family Classification and Characterization Using CFG and DFG","authors":"Zhiwu Xu, Kerong Ren, Fu Song","doi":"10.1109/TASE.2019.00-20","DOIUrl":null,"url":null,"abstract":"Android malware has become a serious threat for our daily life, and thus there is a pressing need to effectively mitigate or defend against them. Recently, many approaches and tools to analyze Android malware have been proposed to protect legitimate users from the threat. However, most approaches focus on malware detection, while only a few of them consider malware classification or malware characterization. In this paper, we propose an extension of CDGDroid to classifying and characterizing Android malware families automatically. We first perform static analysis used in CDGDroid to extract control-flow graphs and data-flow graphs on the instruction level. Then we encode the graphs into matrices, and use them to build the family classification models via deep learning. For family characterization, we extract the n-gram sequences from the graphs, which are filtered according to the weights of the classification model built for the target family. And then we construct a vector space model and select the top-k sequences as a characterization of the target family. We have conducted some experiments to evaluate our approach and have identified that the family classification model taking the horizontal combination of CFG and DFG as features offers the best performance in terms of accuracy among all the models. Compared with CDGDroid, Drebin and many antivirus tools gathered in VirusTotal, our family classification model gives a better performance. Finally, We have also conducted experiments on family characterization, and the experimental results have shown that our characterization can capture the malicious behaviors of the testing families.","PeriodicalId":183749,"journal":{"name":"2019 International Symposium on Theoretical Aspects of Software Engineering (TASE)","volume":"18 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Symposium on Theoretical Aspects of Software Engineering (TASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASE.2019.00-20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

Abstract

Android malware has become a serious threat for our daily life, and thus there is a pressing need to effectively mitigate or defend against them. Recently, many approaches and tools to analyze Android malware have been proposed to protect legitimate users from the threat. However, most approaches focus on malware detection, while only a few of them consider malware classification or malware characterization. In this paper, we propose an extension of CDGDroid to classifying and characterizing Android malware families automatically. We first perform static analysis used in CDGDroid to extract control-flow graphs and data-flow graphs on the instruction level. Then we encode the graphs into matrices, and use them to build the family classification models via deep learning. For family characterization, we extract the n-gram sequences from the graphs, which are filtered according to the weights of the classification model built for the target family. And then we construct a vector space model and select the top-k sequences as a characterization of the target family. We have conducted some experiments to evaluate our approach and have identified that the family classification model taking the horizontal combination of CFG and DFG as features offers the best performance in terms of accuracy among all the models. Compared with CDGDroid, Drebin and many antivirus tools gathered in VirusTotal, our family classification model gives a better performance. Finally, We have also conducted experiments on family characterization, and the experimental results have shown that our characterization can capture the malicious behaviors of the testing families.

查看原文本刊更多论文

基于CFG和DFG的Android恶意软件家族分类与表征

Android恶意软件已经成为我们日常生活的严重威胁，因此迫切需要有效地减轻或防御它们。最近，人们提出了许多分析Android恶意软件的方法和工具，以保护合法用户免受威胁。然而，大多数方法侧重于恶意软件检测，而只有少数方法考虑恶意软件分类或恶意软件特征。在本文中，我们提出了一个扩展的CDGDroid来自动分类和表征Android恶意软件家族。我们首先在CDGDroid中使用静态分析来提取指令级的控制流图和数据流图。然后我们将这些图编码成矩阵，并利用它们通过深度学习建立家族分类模型。对于家族特征，我们从图中提取n-gram序列，并根据为目标家族构建的分类模型的权重对其进行过滤。然后，我们构建一个向量空间模型，并选择top-k序列作为目标族的表征。我们进行了一些实验来评估我们的方法，并发现以CFG和DFG的水平组合为特征的家族分类模型在所有模型中具有最好的准确性。与VirusTotal中收集的CDGDroid、Drebin和许多杀毒工具相比，我们的家族分类模型具有更好的性能。最后，我们还进行了家族表征的实验，实验结果表明，我们的表征可以捕获测试家族的恶意行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Symposium on Theoretical Aspects of Software Engineering (TASE)

自引率

0.00%

发文量