Benchmark Fuzzing for Android Taint Analyses

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2022-10-01 DOI:10.1109/SCAM55253.2022.00007

Stefan Schott, Felix Pauck

{"title":"Benchmark Fuzzing for Android Taint Analyses","authors":"Stefan Schott, Felix Pauck","doi":"10.1109/SCAM55253.2022.00007","DOIUrl":null,"url":null,"abstract":"Benchmarking is the most often used technique to empirically evaluate software. To do so, benchmarks are often manually created when they are needed. Mainly two kinds of benchmarks are frequently employed: micro and real-world benchmarks. While micro benchmarks are most of the time handcrafted from scratch, real-world benchmarks are typically created by collecting available software from repositories or markets. Both types have their deficits. On the one hand, a handcrafted micro benchmark can only be of limited complexity, but the creator knows its ground-truth which is needed for precise evaluations. On the other hand, in case of a complex real-world benchmark, a ground-truth is unavailable in most cases. To bring together the best of both worlds we propose the concept of benchmark fuzzing, a three step procedure that allows for an automatic generation, execution and evaluation of benchmarks of configurable size and versatility. We implemented benchmark fuzzing in our novel Android taint analysis benchmark generation tool GenBenchDroid. Our evaluation performed on GenBenchDroidshows the benefits of benchmark fuzzing. We show that over-adaptation of benchmarks can broadly be decreased, scalability issues of analysis tools can be detected and combinations of analysis challenges that negatively impact analysis' accuracy can be identified. In addition, benchmark fuzzing allows to regenerate up-to-date versions of state-of-the-art micro and real-world benchmarks. Furthermore, our evaluation shows that the cost of benchmark fuzzing can be estimated and appears to be reasonable in regards of the advantages.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM55253.2022.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Benchmarking is the most often used technique to empirically evaluate software. To do so, benchmarks are often manually created when they are needed. Mainly two kinds of benchmarks are frequently employed: micro and real-world benchmarks. While micro benchmarks are most of the time handcrafted from scratch, real-world benchmarks are typically created by collecting available software from repositories or markets. Both types have their deficits. On the one hand, a handcrafted micro benchmark can only be of limited complexity, but the creator knows its ground-truth which is needed for precise evaluations. On the other hand, in case of a complex real-world benchmark, a ground-truth is unavailable in most cases. To bring together the best of both worlds we propose the concept of benchmark fuzzing, a three step procedure that allows for an automatic generation, execution and evaluation of benchmarks of configurable size and versatility. We implemented benchmark fuzzing in our novel Android taint analysis benchmark generation tool GenBenchDroid. Our evaluation performed on GenBenchDroidshows the benefits of benchmark fuzzing. We show that over-adaptation of benchmarks can broadly be decreased, scalability issues of analysis tools can be detected and combinations of analysis challenges that negatively impact analysis' accuracy can be identified. In addition, benchmark fuzzing allows to regenerate up-to-date versions of state-of-the-art micro and real-world benchmarks. Furthermore, our evaluation shows that the cost of benchmark fuzzing can be estimated and appears to be reasonable in regards of the advantages.

查看原文本刊更多论文

Android污点分析的基准模糊测试

基准测试是最常用的经验评估软件的技术。为此，通常在需要时手动创建基准。通常使用的基准测试主要有两种:微观基准测试和实际基准测试。虽然微基准测试大多数时候是从头开始手工制作的，但实际基准测试通常是通过从存储库或市场收集可用软件来创建的。这两种类型都有各自的缺陷。一方面，手工制作的微基准只能具有有限的复杂性，但创造者知道它的基本真相，这是精确评估所需要的。另一方面，在复杂的现实世界基准测试中，大多数情况下都无法获得基本事实。为了将这两个世界的优点结合起来，我们提出了基准模糊测试的概念，这是一个三步程序，允许自动生成、执行和评估可配置大小和多功能性的基准。我们在我们新颖的Android污染分析基准生成工具GenBenchDroid中实现了基准模糊测试。我们在genbenchdroid上进行的评估显示了基准模糊测试的好处。我们表明，可以广泛地减少基准的过度适应，可以检测分析工具的可扩展性问题，并且可以识别对分析准确性产生负面影响的分析挑战的组合。此外，基准模糊测试允许重新生成最新版本的最先进的微型基准测试和实际基准测试。此外，我们的评估表明，基准模糊的成本是可以估计的，并且就其优势而言似乎是合理的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)

自引率

0.00%

发文量