{"title":"Android Malware Family Classification and Characterization Using CFG and DFG","authors":"Zhiwu Xu, Kerong Ren, Fu Song","doi":"10.1109/TASE.2019.00-20","DOIUrl":null,"url":null,"abstract":"Android malware has become a serious threat for our daily life, and thus there is a pressing need to effectively mitigate or defend against them. Recently, many approaches and tools to analyze Android malware have been proposed to protect legitimate users from the threat. However, most approaches focus on malware detection, while only a few of them consider malware classification or malware characterization. In this paper, we propose an extension of CDGDroid to classifying and characterizing Android malware families automatically. We first perform static analysis used in CDGDroid to extract control-flow graphs and data-flow graphs on the instruction level. Then we encode the graphs into matrices, and use them to build the family classification models via deep learning. For family characterization, we extract the n-gram sequences from the graphs, which are filtered according to the weights of the classification model built for the target family. And then we construct a vector space model and select the top-k sequences as a characterization of the target family. We have conducted some experiments to evaluate our approach and have identified that the family classification model taking the horizontal combination of CFG and DFG as features offers the best performance in terms of accuracy among all the models. Compared with CDGDroid, Drebin and many antivirus tools gathered in VirusTotal, our family classification model gives a better performance. Finally, We have also conducted experiments on family characterization, and the experimental results have shown that our characterization can capture the malicious behaviors of the testing families.","PeriodicalId":183749,"journal":{"name":"2019 International Symposium on Theoretical Aspects of Software Engineering (TASE)","volume":"18 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Symposium on Theoretical Aspects of Software Engineering (TASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASE.2019.00-20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37
Abstract
Android malware has become a serious threat for our daily life, and thus there is a pressing need to effectively mitigate or defend against them. Recently, many approaches and tools to analyze Android malware have been proposed to protect legitimate users from the threat. However, most approaches focus on malware detection, while only a few of them consider malware classification or malware characterization. In this paper, we propose an extension of CDGDroid to classifying and characterizing Android malware families automatically. We first perform static analysis used in CDGDroid to extract control-flow graphs and data-flow graphs on the instruction level. Then we encode the graphs into matrices, and use them to build the family classification models via deep learning. For family characterization, we extract the n-gram sequences from the graphs, which are filtered according to the weights of the classification model built for the target family. And then we construct a vector space model and select the top-k sequences as a characterization of the target family. We have conducted some experiments to evaluate our approach and have identified that the family classification model taking the horizontal combination of CFG and DFG as features offers the best performance in terms of accuracy among all the models. Compared with CDGDroid, Drebin and many antivirus tools gathered in VirusTotal, our family classification model gives a better performance. Finally, We have also conducted experiments on family characterization, and the experimental results have shown that our characterization can capture the malicious behaviors of the testing families.