Mining sandboxes: Are we there yet?

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2018-03-20 DOI:10.1109/SANER.2018.8330231

Lingfeng Bao, Tien-Duy B. Le, D. Lo

{"title":"Mining sandboxes: Are we there yet?","authors":"Lingfeng Bao, Tien-Duy B. Le, D. Lo","doi":"10.1109/SANER.2018.8330231","DOIUrl":null,"url":null,"abstract":"The popularity of Android platform on mobile devices has attracted much attention from many developers and researchers, as well as malware writers. Recently, Jamrozik et al. proposed a technique to secure Android applications referred to as mining sandboxes. They used an automated test case generation technique to explore the behavior of the app under test and then extracted a set of sensitive APIs that were called. Based on the extracted sensitive APIs, they built a sandbox that can block access to APIs not used during testing. However, they only evaluated the proposed technique with benign apps but not investigated whether it was effective in detecting malicious behavior of malware that infects benign apps. Furthermore, they only investigated one test case generation tool (i.e., Droidmate) to build the sandbox, while many others have been proposed in the literature. In this work, we complement Jamrozik et al.'s work in two ways: (1) we evaluate the effectiveness of mining sandboxes on detecting malicious behaviors; (2) we investigate the effectiveness of multiple automated test case generation tools to mine sandboxes. To investigate effectiveness of mining sandboxes in detecting malicious behaviors, we make use of pairs of malware and benign app it infects. We build a sandbox based on sensitive APIs called by the benign app and check if it can identify malicious behaviors in the corresponding malware. To generate inputs to apps, we investigate five popular test case generation tools: Monkey, Droidmate, Droidbot, GUIRipper, and PUMA. We conduct two experiments to evaluate the effectiveness and efficiency of these test case generation tools on detecting malicious behavior. In the first experiment, we select 10 apps and allow test case generation tools to run for one hour; while in the second experiment, we select 102 pairs of apps and allow the test case generation tools to run for one minute. Our experiments highlight that 75.5%-77.2% of malware in our dataset can be uncovered by mining sandboxes — showing its power to protect Android apps. We also find that Droidbot performs best in generating test cases for mining sandboxes, and its effectiveness can be further boosted when coupled with other test case generation tools.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"20 1","pages":"445-455"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2018.8330231","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

The popularity of Android platform on mobile devices has attracted much attention from many developers and researchers, as well as malware writers. Recently, Jamrozik et al. proposed a technique to secure Android applications referred to as mining sandboxes. They used an automated test case generation technique to explore the behavior of the app under test and then extracted a set of sensitive APIs that were called. Based on the extracted sensitive APIs, they built a sandbox that can block access to APIs not used during testing. However, they only evaluated the proposed technique with benign apps but not investigated whether it was effective in detecting malicious behavior of malware that infects benign apps. Furthermore, they only investigated one test case generation tool (i.e., Droidmate) to build the sandbox, while many others have been proposed in the literature. In this work, we complement Jamrozik et al.'s work in two ways: (1) we evaluate the effectiveness of mining sandboxes on detecting malicious behaviors; (2) we investigate the effectiveness of multiple automated test case generation tools to mine sandboxes. To investigate effectiveness of mining sandboxes in detecting malicious behaviors, we make use of pairs of malware and benign app it infects. We build a sandbox based on sensitive APIs called by the benign app and check if it can identify malicious behaviors in the corresponding malware. To generate inputs to apps, we investigate five popular test case generation tools: Monkey, Droidmate, Droidbot, GUIRipper, and PUMA. We conduct two experiments to evaluate the effectiveness and efficiency of these test case generation tools on detecting malicious behavior. In the first experiment, we select 10 apps and allow test case generation tools to run for one hour; while in the second experiment, we select 102 pairs of apps and allow the test case generation tools to run for one minute. Our experiments highlight that 75.5%-77.2% of malware in our dataset can be uncovered by mining sandboxes — showing its power to protect Android apps. We also find that Droidbot performs best in generating test cases for mining sandboxes, and its effectiveness can be further boosted when coupled with other test case generation tools.

查看原文本刊更多论文

挖矿沙箱:我们到了吗?

Android平台在移动设备上的流行吸引了许多开发人员和研究人员以及恶意软件编写者的关注。最近，Jamrozik等人提出了一种被称为挖掘沙箱的技术来保护Android应用程序。他们使用自动化测试用例生成技术来探索被测应用程序的行为，然后提取一组被调用的敏感api。基于提取的敏感api，他们构建了一个沙箱，可以阻止对测试期间未使用的api的访问。然而，他们只在良性应用中评估了所提出的技术，而没有调查它是否有效地检测感染良性应用的恶意软件的恶意行为。此外，他们只研究了一个测试用例生成工具(例如，Droidmate)来构建沙盒，而在文献中已经提出了许多其他工具。在这项工作中，我们从两个方面补充了Jamrozik等人的工作:(1)我们评估了挖掘沙箱在检测恶意行为方面的有效性;(2)我们研究了多个自动化测试用例生成工具在挖掘沙盒中的有效性。为了研究挖掘沙箱在检测恶意行为方面的有效性，我们利用了它感染的恶意软件和良性应用程序对。我们根据良性应用调用的敏感api构建沙盒，检查是否能识别相应恶意软件中的恶意行为。为了生成应用程序的输入，我们研究了五种流行的测试用例生成工具:Monkey、Droidmate、Droidbot、GUIRipper和PUMA。我们通过两个实验来评估这些测试用例生成工具在检测恶意行为方面的有效性和效率。在第一个实验中，我们选择了10个应用程序，并允许测试用例生成工具运行一个小时;而在第二个实验中，我们选择了102对应用程序，并允许测试用例生成工具运行一分钟。我们的实验强调，在我们的数据集中，75.5%-77.2%的恶意软件可以通过挖掘沙箱来发现，这显示了它保护Android应用程序的能力。我们还发现Droidbot在为挖掘沙箱生成测试用例方面表现最好，当与其他测试用例生成工具结合使用时，它的有效性可以进一步提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量