Statistical Deobfuscation of Android Applications

Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security Pub Date : 2016-10-24 DOI:10.1145/2976749.2978422

Benjamin Bichsel, Veselin Raychev, Petar Tsankov, Martin T. Vechev

{"title":"Statistical Deobfuscation of Android Applications","authors":"Benjamin Bichsel, Veselin Raychev, Petar Tsankov, Martin T. Vechev","doi":"10.1145/2976749.2978422","DOIUrl":null,"url":null,"abstract":"This work presents a new approach for deobfuscating Android APKs based on probabilistic learning of large code bases (termed \"Big Code\"). The key idea is to learn a probabilistic model over thousands of non-obfuscated Android applications and to use this probabilistic model to deobfuscate new, unseen Android APKs. The concrete focus of the paper is on reversing layout obfuscation, a popular transformation which renames key program elements such as classes, packages, and methods, thus making it difficult to understand what the program does. Concretely, the paper: (i) phrases the layout deobfuscation problem of Android APKs as structured prediction in a probabilistic graphical model, (ii) instantiates this model with a rich set of features and constraints that capture the Android setting, ensuring both semantic equivalence and high prediction accuracy, and (iii) shows how to leverage powerful inference and learning algorithms to achieve overall precision and scalability of the probabilistic predictions. We implemented our approach in a tool called DeGuard and used it to: (i) reverse the layout obfuscation performed by the popular ProGuard system on benign, open-source applications, (ii) predict third-party libraries imported by benign APKs (also obfuscated by ProGuard), and (iii) rename obfuscated program elements of Android malware. The experimental results indicate that DeGuard is practically effective: it recovers 79.1% of the program element names obfuscated with ProGuard, it predicts third-party libraries with accuracy of 91.3%, and it reveals string decoders and classes that handle sensitive data in Android malware.","PeriodicalId":432261,"journal":{"name":"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"112","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2976749.2978422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 112

Abstract

This work presents a new approach for deobfuscating Android APKs based on probabilistic learning of large code bases (termed "Big Code"). The key idea is to learn a probabilistic model over thousands of non-obfuscated Android applications and to use this probabilistic model to deobfuscate new, unseen Android APKs. The concrete focus of the paper is on reversing layout obfuscation, a popular transformation which renames key program elements such as classes, packages, and methods, thus making it difficult to understand what the program does. Concretely, the paper: (i) phrases the layout deobfuscation problem of Android APKs as structured prediction in a probabilistic graphical model, (ii) instantiates this model with a rich set of features and constraints that capture the Android setting, ensuring both semantic equivalence and high prediction accuracy, and (iii) shows how to leverage powerful inference and learning algorithms to achieve overall precision and scalability of the probabilistic predictions. We implemented our approach in a tool called DeGuard and used it to: (i) reverse the layout obfuscation performed by the popular ProGuard system on benign, open-source applications, (ii) predict third-party libraries imported by benign APKs (also obfuscated by ProGuard), and (iii) rename obfuscated program elements of Android malware. The experimental results indicate that DeGuard is practically effective: it recovers 79.1% of the program element names obfuscated with ProGuard, it predicts third-party libraries with accuracy of 91.3%, and it reveals string decoders and classes that handle sensitive data in Android malware.

查看原文本刊更多论文

Android应用程序的统计解混淆

这项工作提出了一种基于大型代码库(称为“大代码”)的概率学习来解混淆Android apk的新方法。关键思想是学习一个概率模型，在数千个未混淆的Android应用程序，并使用这个概率模型去混淆新的，看不见的Android apk。本文的具体重点是反转布局混淆，这是一种流行的转换，它重命名了关键的程序元素，如类、包和方法，从而使理解程序的功能变得困难。具体而言，本文:(i)将Android apk的布局去混淆问题描述为概率图模型中的结构化预测;(ii)用一组丰富的特征和约束实例化该模型，这些特征和约束捕获了Android设置，确保了语义等价和高预测精度;(iii)展示了如何利用强大的推理和学习算法来实现概率预测的整体精度和可扩展性。我们在一个名为DeGuard的工具中实现了我们的方法，并使用它:(i)逆转流行的ProGuard系统在良性开源应用程序上执行的布局混淆，(ii)预测良性apk导入的第三方库(也被ProGuard混淆)，以及(iii)重命名Android恶意软件的混淆程序元素。实验结果表明，DeGuard实际上是有效的:它可以恢复79.1%的被ProGuard混淆的程序元素名称，它预测第三方库的准确率为91.3%，它可以揭示Android恶意软件中处理敏感数据的字符串解码器和类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

自引率

0.00%

发文量