{"title":"Benchmark Fuzzing for Android Taint Analyses","authors":"Stefan Schott, Felix Pauck","doi":"10.1109/SCAM55253.2022.00007","DOIUrl":null,"url":null,"abstract":"Benchmarking is the most often used technique to empirically evaluate software. To do so, benchmarks are often manually created when they are needed. Mainly two kinds of benchmarks are frequently employed: micro and real-world benchmarks. While micro benchmarks are most of the time handcrafted from scratch, real-world benchmarks are typically created by collecting available software from repositories or markets. Both types have their deficits. On the one hand, a handcrafted micro benchmark can only be of limited complexity, but the creator knows its ground-truth which is needed for precise evaluations. On the other hand, in case of a complex real-world benchmark, a ground-truth is unavailable in most cases. To bring together the best of both worlds we propose the concept of benchmark fuzzing, a three step procedure that allows for an automatic generation, execution and evaluation of benchmarks of configurable size and versatility. We implemented benchmark fuzzing in our novel Android taint analysis benchmark generation tool GenBenchDroid. Our evaluation performed on GenBenchDroidshows the benefits of benchmark fuzzing. We show that over-adaptation of benchmarks can broadly be decreased, scalability issues of analysis tools can be detected and combinations of analysis challenges that negatively impact analysis' accuracy can be identified. In addition, benchmark fuzzing allows to regenerate up-to-date versions of state-of-the-art micro and real-world benchmarks. Furthermore, our evaluation shows that the cost of benchmark fuzzing can be estimated and appears to be reasonable in regards of the advantages.","PeriodicalId":138287,"journal":{"name":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM55253.2022.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Benchmarking is the most often used technique to empirically evaluate software. To do so, benchmarks are often manually created when they are needed. Mainly two kinds of benchmarks are frequently employed: micro and real-world benchmarks. While micro benchmarks are most of the time handcrafted from scratch, real-world benchmarks are typically created by collecting available software from repositories or markets. Both types have their deficits. On the one hand, a handcrafted micro benchmark can only be of limited complexity, but the creator knows its ground-truth which is needed for precise evaluations. On the other hand, in case of a complex real-world benchmark, a ground-truth is unavailable in most cases. To bring together the best of both worlds we propose the concept of benchmark fuzzing, a three step procedure that allows for an automatic generation, execution and evaluation of benchmarks of configurable size and versatility. We implemented benchmark fuzzing in our novel Android taint analysis benchmark generation tool GenBenchDroid. Our evaluation performed on GenBenchDroidshows the benefits of benchmark fuzzing. We show that over-adaptation of benchmarks can broadly be decreased, scalability issues of analysis tools can be detected and combinations of analysis challenges that negatively impact analysis' accuracy can be identified. In addition, benchmark fuzzing allows to regenerate up-to-date versions of state-of-the-art micro and real-world benchmarks. Furthermore, our evaluation shows that the cost of benchmark fuzzing can be estimated and appears to be reasonable in regards of the advantages.