{"title":"ENZZ: Effective N-gram coverage assisted fuzzing with nearest neighboring branch estimation","authors":"Xi Peng, Peng Jia, Ximing Fan, Jiayong Liu","doi":"10.1016/j.infsof.2024.107582","DOIUrl":null,"url":null,"abstract":"<div><div>Fuzzing is a highly effective approach for identifying software bugs and vulnerabilities. Among the various techniques employed, coverage-guided fuzzing stands out as particularly valuable, relying on tracing code coverage information. N-gram coverage is a coverage metric in gray-box fuzz testing, where the value of N determines the sensitivity of the coverage. Block and edge coverage can be represented as 0-gram and 1-gram, respectively. The value of N can range from 0 to infinity. However, the specific methodology for selecting the appropriate value of N is still an area yet to be explored.</div><div>This paper proposes an estimation method based on the nearest branch. We initially explained the role of N-gram in the execution paths of programs and elucidated the objective of selecting the value of N, which aims to cover the closest neighboring branches. Subsequently, based on this objective, we proposed a method for calculating N based on the closest neighboring branch and estimated N at the function level. Finally, in this paper, we designed a scheduling mechanism using Adversarial Multi-Armed Bandit model that automatically selects either the seeds generated by N-gram or the original queue seeds for fuzz testing.</div><div>We implement our approach in ENZZ based on AFL and compare it with other N-gram coverage fuzzers and the state-of-the-art path coverage-assisted fuzzer PathAFL. We find that ENZZ outperforms other N-gram fuzzers and PathAFL by achieving an average improvement of 5.57% and 4.38% on edge coverage, and it improves the efficiency of path-to-edge discovery by 31.5% and 26.1%, respectively, on 12 Google FuzzBench programs. It also finds more bugs than other N-gram fuzzers and three more real-world bugs than PathAFL.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"177 ","pages":"Article 107582"},"PeriodicalIF":3.8000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001873","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Fuzzing is a highly effective approach for identifying software bugs and vulnerabilities. Among the various techniques employed, coverage-guided fuzzing stands out as particularly valuable, relying on tracing code coverage information. N-gram coverage is a coverage metric in gray-box fuzz testing, where the value of N determines the sensitivity of the coverage. Block and edge coverage can be represented as 0-gram and 1-gram, respectively. The value of N can range from 0 to infinity. However, the specific methodology for selecting the appropriate value of N is still an area yet to be explored.
This paper proposes an estimation method based on the nearest branch. We initially explained the role of N-gram in the execution paths of programs and elucidated the objective of selecting the value of N, which aims to cover the closest neighboring branches. Subsequently, based on this objective, we proposed a method for calculating N based on the closest neighboring branch and estimated N at the function level. Finally, in this paper, we designed a scheduling mechanism using Adversarial Multi-Armed Bandit model that automatically selects either the seeds generated by N-gram or the original queue seeds for fuzz testing.
We implement our approach in ENZZ based on AFL and compare it with other N-gram coverage fuzzers and the state-of-the-art path coverage-assisted fuzzer PathAFL. We find that ENZZ outperforms other N-gram fuzzers and PathAFL by achieving an average improvement of 5.57% and 4.38% on edge coverage, and it improves the efficiency of path-to-edge discovery by 31.5% and 26.1%, respectively, on 12 Google FuzzBench programs. It also finds more bugs than other N-gram fuzzers and three more real-world bugs than PathAFL.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.