Obfuscation-Resilient Privacy Leak Detection for Mobile Apps Through Differential Analysis

Proceedings 2019 Network and Distributed System Security Symposium Pub Date : 2017-02-01 DOI:10.14722/NDSS.2017.23465

Andrea Continella, Y. Fratantonio, Martina Lindorfer, Alessandro Puccetti, Ali Zand, Christopher Krügel, G. Vigna

{"title":"Obfuscation-Resilient Privacy Leak Detection for Mobile Apps Through Differential Analysis","authors":"Andrea Continella, Y. Fratantonio, Martina Lindorfer, Alessandro Puccetti, Ali Zand, Christopher Krügel, G. Vigna","doi":"10.14722/NDSS.2017.23465","DOIUrl":null,"url":null,"abstract":"Mobile apps are notorious for collecting a wealth of private information from users. Despite significant effort from the research community in developing privacy leak detection tools based on data flow tracking inside the app or through network traffic analysis, it is still unclear whether apps and ad libraries can hide the fact that they are leaking private information. In fact, all existing analysis tools have limitations: data flow tracking suffers from imprecisions that cause false positives, as well as false negatives when the data flow from a source of private information to a network sink is interrupted; on the other hand, network traffic analysis cannot handle encryption or custom encoding. We propose a new approach to privacy leak detection that is not affected by such limitations, and it is also resilient to obfuscation techniques, such as encoding, formatting, encryption, or any other kind of transformation performed on private information before it is leaked. Our work is based on blackbox differential analysis, and it works in two steps: first, it establishes a baseline of the network behavior of an app; then, it modifies sources of private information, such as the device ID and location, and detects leaks by observing deviations in the resulting network traffic. The basic concept of black-box differential analysis is not novel, but, unfortunately, it is not practical enough to precisely analyze modern mobile apps. In fact, their network traffic contains many sources of non-determinism, such as random identifiers, timestamps, and server-assigned session identifiers, which, when not handled properly, cause too much noise to correlate output changes with input changes. The main contribution of this work is to make black-box differential analysis practical when applied to modern Android apps. In particular, we show that the network-based non-determinism can often be explained and eliminated, and it is thus possible to reliably use variations in the network traffic as a strong signal to detect privacy leaks. We implemented this approach in a tool, called AGRIGENTO, and we evaluated it on more than one thousand Android apps. Our evaluation shows that our approach works well in practice and outperforms current state-of-the-art techniques. We conclude our study by discussing several case studies that show how popular apps and ad libraries currently exfiltrate data by using complex combinations of encoding and encryption mechanisms that other approaches fail to detect. Our results show that these apps and libraries seem to deliberately hide their data leaks from current approaches and clearly demonstrate the need for an obfuscation-resilient approach such as ours.","PeriodicalId":20444,"journal":{"name":"Proceedings 2019 Network and Distributed System Security Symposium","volume":"48 1","pages":"1-15"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"105","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2019 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/NDSS.2017.23465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 105

Abstract

Mobile apps are notorious for collecting a wealth of private information from users. Despite significant effort from the research community in developing privacy leak detection tools based on data flow tracking inside the app or through network traffic analysis, it is still unclear whether apps and ad libraries can hide the fact that they are leaking private information. In fact, all existing analysis tools have limitations: data flow tracking suffers from imprecisions that cause false positives, as well as false negatives when the data flow from a source of private information to a network sink is interrupted; on the other hand, network traffic analysis cannot handle encryption or custom encoding. We propose a new approach to privacy leak detection that is not affected by such limitations, and it is also resilient to obfuscation techniques, such as encoding, formatting, encryption, or any other kind of transformation performed on private information before it is leaked. Our work is based on blackbox differential analysis, and it works in two steps: first, it establishes a baseline of the network behavior of an app; then, it modifies sources of private information, such as the device ID and location, and detects leaks by observing deviations in the resulting network traffic. The basic concept of black-box differential analysis is not novel, but, unfortunately, it is not practical enough to precisely analyze modern mobile apps. In fact, their network traffic contains many sources of non-determinism, such as random identifiers, timestamps, and server-assigned session identifiers, which, when not handled properly, cause too much noise to correlate output changes with input changes. The main contribution of this work is to make black-box differential analysis practical when applied to modern Android apps. In particular, we show that the network-based non-determinism can often be explained and eliminated, and it is thus possible to reliably use variations in the network traffic as a strong signal to detect privacy leaks. We implemented this approach in a tool, called AGRIGENTO, and we evaluated it on more than one thousand Android apps. Our evaluation shows that our approach works well in practice and outperforms current state-of-the-art techniques. We conclude our study by discussing several case studies that show how popular apps and ad libraries currently exfiltrate data by using complex combinations of encoding and encryption mechanisms that other approaches fail to detect. Our results show that these apps and libraries seem to deliberately hide their data leaks from current approaches and clearly demonstrate the need for an obfuscation-resilient approach such as ours.

查看原文本刊更多论文

通过差异分析实现移动应用的模糊弹性隐私泄漏检测

手机应用程序因收集用户大量私人信息而臭名昭著。尽管研究界在开发基于应用内部数据流跟踪或网络流量分析的隐私泄漏检测工具方面做出了巨大努力，但应用程序和广告库是否能够隐藏它们泄露隐私信息的事实仍不清楚。事实上，所有现有的分析工具都有局限性:数据流跟踪存在导致误报的不精确性，以及当从私有信息源到网络接收器的数据流中断时的误报;另一方面，网络流量分析不能处理加密或自定义编码。我们提出了一种新的隐私泄漏检测方法，该方法不受这些限制的影响，并且它也能够适应混淆技术，例如编码、格式化、加密或在泄漏之前对私有信息执行的任何其他类型的转换。我们的工作是基于黑盒差异分析，它分两步工作:首先，它建立一个应用程序的网络行为基线;然后，它修改私人信息的来源，如设备ID和位置，并通过观察由此产生的网络流量的偏差来检测泄漏。黑盒差分分析的基本概念并不新颖，但不幸的是，它不足以精确分析现代移动应用程序。事实上，它们的网络流量包含许多不确定性的来源，例如随机标识符、时间戳和服务器分配的会话标识符，如果处理不当，会导致过多的噪声，无法将输出更改与输入更改关联起来。这项工作的主要贡献是使黑盒差分分析在应用于现代Android应用程序时变得实用。特别是，我们表明基于网络的非确定性通常可以被解释和消除，因此可以可靠地使用网络流量的变化作为强信号来检测隐私泄漏。我们在一个名为AGRIGENTO的工具中实现了这一方法，并在1000多个Android应用程序上进行了评估。我们的评估表明，我们的方法在实践中效果良好，优于目前最先进的技术。我们通过讨论几个案例研究来总结我们的研究，这些案例研究表明，目前流行的应用程序和广告库是如何通过使用其他方法无法检测到的复杂编码和加密机制组合来泄露数据的。我们的研究结果表明，这些应用程序和库似乎故意隐藏当前方法的数据泄漏，并清楚地表明需要像我们这样的抗混淆方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 2019 Network and Distributed System Security Symposium

自引率

0.00%

发文量