{"title":"DiPri: Distance-based Seed Prioritization for Greybox Fuzzing","authors":"Ruixiang Qian, Quanjun Zhang, Chunrong Fang, Ding Yang, Shun Li, Binyu Li, Zhenyu Chen","doi":"10.1145/3654440","DOIUrl":null,"url":null,"abstract":"<p>Greybox fuzzing is a powerful testing technique. Given a set of initial seeds, greybox fuzzing continuously generates new test inputs to execute the program under test and drives executions with code coverage as feedback. Seed prioritization is an important step of greybox fuzzing that helps greybox fuzzing choose promising seeds for input generation in priority. However, mainstream greybox fuzzers like AFL++ and Zest tend to neglect the importance of seed prioritization. They may pick seeds plainly according to the sequential order of the seeds being queued or an order produced with a random-based approach, which may consequently degrade their performance in exploring code and exposing bugs. In the meantime, existing state-of-the-art techniques like Alphuzz and K-Scheduler adopt complex strategies to schedule seeds. Although powerful, such strategies also inevitably incur great overhead and will reduce the scalability of the proposed technique. </p><p>In this paper, we propose a novel distance-based seed prioritization approach named <span>DiPri</span> to facilitate greybox fuzzing. </p><p>Specifically, <span>DiPri</span> evaluates the queued seeds according to seed distances and chooses the outlier ones, which are the farthest from the others, in priority to improve the probabilities of discovering previously unexplored code regions. To make a profound evaluation of <span>DiPri</span>, we prototype <span>DiPri</span> on AFL++ and conduct large-scale experiments with four baselines and 24 C/C++ fuzz targets, where eight are from widely adopted real-world projects, eight are from the coverage-based benchmark FuzzBench, and eight are from the bug-based benchmark Magma. The results obtained through a fuzzing exceeding 50,000 CPU hours suggest that <span>DiPri</span> can (1) insignificantly influence the host fuzzer’s capability of code coverage by slightly improving the branch coverage on the eight targets from real-world projects and slightly reducing the branch coverage on the eight targets from FuzzBench, and (2) improve the host fuzzer’s capability of finding bugs by triggering five more Magma bugs. Besides the evaluation with the three C/C++ benchmarks, we integrate <span>DiPri</span> into the Java fuzzer Zest and conduct experiments on a Java benchmark composed of five real-world programs for more than 8,000 CPU hours to empirically study the scalability of <span>DiPri</span>. The results with the Java benchmark demonstrate that <span>DiPri</span> is pretty scalable and can help the host fuzzer find bugs more consistently.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"30 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3654440","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Greybox fuzzing is a powerful testing technique. Given a set of initial seeds, greybox fuzzing continuously generates new test inputs to execute the program under test and drives executions with code coverage as feedback. Seed prioritization is an important step of greybox fuzzing that helps greybox fuzzing choose promising seeds for input generation in priority. However, mainstream greybox fuzzers like AFL++ and Zest tend to neglect the importance of seed prioritization. They may pick seeds plainly according to the sequential order of the seeds being queued or an order produced with a random-based approach, which may consequently degrade their performance in exploring code and exposing bugs. In the meantime, existing state-of-the-art techniques like Alphuzz and K-Scheduler adopt complex strategies to schedule seeds. Although powerful, such strategies also inevitably incur great overhead and will reduce the scalability of the proposed technique.
In this paper, we propose a novel distance-based seed prioritization approach named DiPri to facilitate greybox fuzzing.
Specifically, DiPri evaluates the queued seeds according to seed distances and chooses the outlier ones, which are the farthest from the others, in priority to improve the probabilities of discovering previously unexplored code regions. To make a profound evaluation of DiPri, we prototype DiPri on AFL++ and conduct large-scale experiments with four baselines and 24 C/C++ fuzz targets, where eight are from widely adopted real-world projects, eight are from the coverage-based benchmark FuzzBench, and eight are from the bug-based benchmark Magma. The results obtained through a fuzzing exceeding 50,000 CPU hours suggest that DiPri can (1) insignificantly influence the host fuzzer’s capability of code coverage by slightly improving the branch coverage on the eight targets from real-world projects and slightly reducing the branch coverage on the eight targets from FuzzBench, and (2) improve the host fuzzer’s capability of finding bugs by triggering five more Magma bugs. Besides the evaluation with the three C/C++ benchmarks, we integrate DiPri into the Java fuzzer Zest and conduct experiments on a Java benchmark composed of five real-world programs for more than 8,000 CPU hours to empirically study the scalability of DiPri. The results with the Java benchmark demonstrate that DiPri is pretty scalable and can help the host fuzzer find bugs more consistently.
期刊介绍:
Designing and building a large, complex software system is a tremendous challenge. ACM Transactions on Software Engineering and Methodology (TOSEM) publishes papers on all aspects of that challenge: specification, design, development and maintenance. It covers tools and methodologies, languages, data structures, and algorithms. TOSEM also reports on successful efforts, noting practical lessons that can be scaled and transferred to other projects, and often looks at applications of innovative technologies. The tone is scholarly but readable; the content is worthy of study; the presentation is effective.