NEZHA: Efficient Domain-Independent Differential Testing

2017 IEEE Symposium on Security and Privacy (SP) Pub Date : 2017-05-22 DOI:10.1109/SP.2017.27

Theofilos Petsios, Adrian Tang, S. Stolfo, A. Keromytis, S. Jana

{"title":"NEZHA: Efficient Domain-Independent Differential Testing","authors":"Theofilos Petsios, Adrian Tang, S. Stolfo, A. Keromytis, S. Jana","doi":"10.1109/SP.2017.27","DOIUrl":null,"url":null,"abstract":"Differential testing uses similar programs as cross-referencing oracles to find semantic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Unfortunately, existing differential testing tools are domain-specific and inefficient, requiring large numbers of test inputs to find a single bug. In this paper, we address these issues by designing and implementing NEZHA, an efficient input-format-agnostic differential testing framework. The key insight behind NEZHA's design is that current tools generate inputs by simply borrowing techniques designed for finding crash or memory corruption bugs in individual programs (e.g., maximizing code coverage). By contrast, NEZHA exploits the behavioral asymmetries between multiple test programs to focus on inputs that are more likely to trigger semantic bugs. We introduce the notion of δ-diversity, which summarizes the observed asymmetries between the behaviors of multiple test applications. Based on δ-diversity, we design two efficient domain-independent input generation mechanisms for differential testing, one gray-box and one black-box. We demonstrate that both of these input generation schemes are significantly more efficient than existing tools at finding semantic bugs in real-world, complex software. NEZHA's average rate of finding differences is 52 times and 27 times higher than that of Frankencerts and Mucerts, two popular domain-specific differential testing tools that check SSL/TLS certificate validation implementations, respectively. Moreover, performing differential testing with NEZHA results in 6 times more semantic bugs per tested input, compared to adapting state-of-the-art general-purpose fuzzers like American Fuzzy Lop (AFL) to differential testing by running them on individual test programs for input generation. NEZHA discovered 778 unique, previously unknown discrepancies across a wide variety of applications (ELF and XZ parsers, PDF viewers and SSL/TLS libraries), many of which constitute previously unknown critical security vulnerabilities. In particular, we found two critical evasion attacks against ClamAV, allowing arbitrary malicious ELF/XZ files to evade detection. The discrepancies NEZHA found in the X.509 certificate validation implementations of the tested SSL/TLS libraries range from mishandling certain types of KeyUsage extensions, to incorrect acceptance of specially crafted expired certificates, enabling man-in-the-middle attacks. All of our reported vulnerabilities have been confirmed and fixed within a week from the date of reporting.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"35 1","pages":"615-632"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"88","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2017.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 88

Abstract

Differential testing uses similar programs as cross-referencing oracles to find semantic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Unfortunately, existing differential testing tools are domain-specific and inefficient, requiring large numbers of test inputs to find a single bug. In this paper, we address these issues by designing and implementing NEZHA, an efficient input-format-agnostic differential testing framework. The key insight behind NEZHA's design is that current tools generate inputs by simply borrowing techniques designed for finding crash or memory corruption bugs in individual programs (e.g., maximizing code coverage). By contrast, NEZHA exploits the behavioral asymmetries between multiple test programs to focus on inputs that are more likely to trigger semantic bugs. We introduce the notion of δ-diversity, which summarizes the observed asymmetries between the behaviors of multiple test applications. Based on δ-diversity, we design two efficient domain-independent input generation mechanisms for differential testing, one gray-box and one black-box. We demonstrate that both of these input generation schemes are significantly more efficient than existing tools at finding semantic bugs in real-world, complex software. NEZHA's average rate of finding differences is 52 times and 27 times higher than that of Frankencerts and Mucerts, two popular domain-specific differential testing tools that check SSL/TLS certificate validation implementations, respectively. Moreover, performing differential testing with NEZHA results in 6 times more semantic bugs per tested input, compared to adapting state-of-the-art general-purpose fuzzers like American Fuzzy Lop (AFL) to differential testing by running them on individual test programs for input generation. NEZHA discovered 778 unique, previously unknown discrepancies across a wide variety of applications (ELF and XZ parsers, PDF viewers and SSL/TLS libraries), many of which constitute previously unknown critical security vulnerabilities. In particular, we found two critical evasion attacks against ClamAV, allowing arbitrary malicious ELF/XZ files to evade detection. The discrepancies NEZHA found in the X.509 certificate validation implementations of the tested SSL/TLS libraries range from mishandling certain types of KeyUsage extensions, to incorrect acceptance of specially crafted expired certificates, enabling man-in-the-middle attacks. All of our reported vulnerabilities have been confirmed and fixed within a week from the date of reporting.

查看原文本刊更多论文

哪吒:高效的域无关差分测试

差异测试使用类似的程序作为交叉引用oracle来查找语义错误，这些错误不会表现出明显的错误行为，如崩溃或断言失败。不幸的是，现有的差异测试工具是特定于领域的，并且效率低下，需要大量的测试输入才能找到单个错误。在本文中，我们通过设计和实现NEZHA来解决这些问题，NEZHA是一个有效的输入格式无关的差分测试框架。NEZHA设计背后的关键见解是，当前的工具通过简单地借用用于查找单个程序中的崩溃或内存损坏bug的技术(例如，最大化代码覆盖率)来生成输入。相比之下，NEZHA利用多个测试程序之间的行为不对称，专注于更有可能触发语义错误的输入。我们引入了δ分集的概念，它总结了在多个测试应用中观察到的行为之间的不对称性。基于δ-分集，设计了两种高效的域无关输入生成机制，分别为灰盒和黑盒。我们证明了这两种输入生成方案在寻找现实世界中复杂软件中的语义错误方面比现有的工具要有效得多。NEZHA发现差异的平均比率是Frankencerts和Mucerts的52倍和27倍，Frankencerts和Mucerts是两种流行的域特定差异测试工具，分别检查SSL/TLS证书验证实现。此外，与将美国Fuzzy Lop (AFL)等最先进的通用fuzzers应用于差分测试(通过在单个测试程序上运行它们来生成输入)相比，使用NEZHA执行差分测试会导致每个测试输入的语义错误增加6倍。NEZHA在各种应用程序(ELF和XZ解析器、PDF查看器和SSL/TLS库)中发现了778个独特的、以前未知的差异，其中许多都构成了以前未知的关键安全漏洞。特别是，我们发现了两个针对ClamAV的关键逃避攻击，允许任意恶意ELF/XZ文件逃避检测。NEZHA在测试的SSL/TLS库的X.509证书验证实现中发现的差异包括错误处理某些类型的KeyUsage扩展，错误地接受特殊制作的过期证书，从而启用中间人攻击。我们报告的所有漏洞都在报告之日起一周内得到确认和修复。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Symposium on Security and Privacy (SP)

自引率

0.00%

发文量