NAUTILUS: Fishing for Deep Bugs with Grammars

Proceedings 2019 Network and Distributed System Security Symposium Pub Date : 2019-02-01 DOI:10.14722/ndss.2019.23412

Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, A. Sadeghi, D. Teuchert

{"title":"NAUTILUS: Fishing for Deep Bugs with Grammars","authors":"Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, A. Sadeghi, D. Teuchert","doi":"10.14722/ndss.2019.23412","DOIUrl":null,"url":null,"abstract":"Fuzz testing is a well-known method for efficiently identifying bugs in programs. Unfortunately, when programs that require highly-structured inputs such as interpreters are fuzzed, many fuzzing methods struggle to pass the syntax checks: interpreters often process inputs in multiple stages, first syntactic and then semantic correctness is checked. Only if both checks are passed, the interpreted code gets executed. This prevents fuzzers from executing “deeper” — and hence potentially more interesting — code. Typically, two valid inputs that lead to the execution of different features in the target program require too many mutations for simple mutation-based fuzzers to discover: making small changes like bit flips usually only leads to the execution of error paths in the parsing engine. So-called grammar fuzzers are able to pass the syntax checks by using ContextFree Grammars. Feedback can significantly increase the efficiency of fuzzing engines and is commonly used in state-of-the-art mutational fuzzers which do not use grammars. Yet, current grammar fuzzers do not make use of code coverage, i.e., they do not know whether any input triggers new functionality. In this paper, we propose NAUTILUS, a method to efficiently fuzz programs that require highly-structured inputs by combining the use of grammars with the use of code coverage feedback. This allows us to recombine aspects of interesting inputs, and to increase the probability that any generated input will be syntactically and semantically correct. We implemented a proofof-concept fuzzer that we tested on multiple targets, including ChakraCore (the JavaScript engine of Microsoft Edge), PHP, mruby, and Lua. NAUTILUS identified multiple bugs in all of the targets: Seven in mruby, three in PHP, two in ChakraCore, and one in Lua. Reporting these bugs was awarded with a sum of 2600 USD and 6 CVEs were assigned. Our experiments show that combining context-free grammars and feedback-driven fuzzing significantly outperforms state-of-the-art approaches like AFL by an order of magnitude and grammar fuzzers by more than a factor of two when measuring code coverage.","PeriodicalId":20444,"journal":{"name":"Proceedings 2019 Network and Distributed System Security Symposium","volume":"7 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"171","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2019 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2019.23412","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 171

Abstract

Fuzz testing is a well-known method for efficiently identifying bugs in programs. Unfortunately, when programs that require highly-structured inputs such as interpreters are fuzzed, many fuzzing methods struggle to pass the syntax checks: interpreters often process inputs in multiple stages, first syntactic and then semantic correctness is checked. Only if both checks are passed, the interpreted code gets executed. This prevents fuzzers from executing “deeper” — and hence potentially more interesting — code. Typically, two valid inputs that lead to the execution of different features in the target program require too many mutations for simple mutation-based fuzzers to discover: making small changes like bit flips usually only leads to the execution of error paths in the parsing engine. So-called grammar fuzzers are able to pass the syntax checks by using ContextFree Grammars. Feedback can significantly increase the efficiency of fuzzing engines and is commonly used in state-of-the-art mutational fuzzers which do not use grammars. Yet, current grammar fuzzers do not make use of code coverage, i.e., they do not know whether any input triggers new functionality. In this paper, we propose NAUTILUS, a method to efficiently fuzz programs that require highly-structured inputs by combining the use of grammars with the use of code coverage feedback. This allows us to recombine aspects of interesting inputs, and to increase the probability that any generated input will be syntactically and semantically correct. We implemented a proofof-concept fuzzer that we tested on multiple targets, including ChakraCore (the JavaScript engine of Microsoft Edge), PHP, mruby, and Lua. NAUTILUS identified multiple bugs in all of the targets: Seven in mruby, three in PHP, two in ChakraCore, and one in Lua. Reporting these bugs was awarded with a sum of 2600 USD and 6 CVEs were assigned. Our experiments show that combining context-free grammars and feedback-driven fuzzing significantly outperforms state-of-the-art approaches like AFL by an order of magnitude and grammar fuzzers by more than a factor of two when measuring code coverage.

查看原文本刊更多论文

鹦鹉螺:用语法寻找深层bug

模糊测试是一种众所周知的有效识别程序错误的方法。不幸的是，当需要高度结构化输入(如解释器)的程序被模糊化时，许多模糊化方法很难通过语法检查:解释器通常分多个阶段处理输入，首先检查语法正确性，然后检查语义正确性。只有当两个检查都通过时，解释后的代码才会被执行。这可以防止模糊器执行“更深层次”的代码——因此可能更有趣。通常，导致在目标程序中执行不同功能的两个有效输入需要太多的突变，简单的基于突变的模糊器无法发现:进行像位翻转这样的小更改通常只会导致在解析引擎中执行错误路径。所谓的语法模糊器能够通过使用ContextFree Grammars来通过语法检查。反馈可以显著提高模糊引擎的效率，并且通常用于不使用语法的最先进的突变模糊器。然而，当前的语法模糊器没有利用代码覆盖率，也就是说，它们不知道是否有任何输入触发了新的功能。在本文中，我们提出了NAUTILUS，这是一种通过结合使用语法和使用代码覆盖反馈来有效模糊需要高度结构化输入的程序的方法。这允许我们重新组合感兴趣的输入的各个方面，并增加任何生成的输入在语法和语义上正确的可能性。我们实现了一个概念验证模糊器，并在多个目标上进行了测试，包括ChakraCore (Microsoft Edge的JavaScript引擎)、PHP、mruby和Lua。NAUTILUS在所有目标中发现了多个漏洞:mruby中有7个，PHP中有3个，ChakraCore中有2个，Lua中有1个。报告这些漏洞将获得2600美元的奖励，并分配6个cve。我们的实验表明，在测量代码覆盖率时，结合上下文无关的语法和反馈驱动的模糊测试明显优于AFL等最先进的方法，其数量级和语法模糊测试的性能都超过了两倍以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 2019 Network and Distributed System Security Symposium

自引率

0.00%

发文量