Boosting Fuzzer Efficiency: An Information Theoretic Perspective

IF 12.2 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Communications of the ACM Pub Date : 2023-10-20 DOI:10.1145/3611019

Marcel Böhme, Valentin J. M. Manès, Sang Kil Cha

{"title":"Boosting Fuzzer Efficiency: An Information Theoretic Perspective","authors":"Marcel Böhme, Valentin J. M. Manès, Sang Kil Cha","doi":"10.1145/3611019","DOIUrl":null,"url":null,"abstract":"In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, we know nothing about the behaviors of a program P : What does it do? Executing the first test input, we learn how P behaves for this input. Executing the next input, we either observe the same or discover a new behavior. As such, each execution reveals \"some amount\" of information about P 's behaviors. A classic measure of information is Shannon's entropy. Measuring entropy allows us to quantify how much is learned from each generated test input about the behaviors of the program. Within a probabilistic model of fuzzing, we show how entropy also measures fuzzer efficiency. Specifically, it measures the general rate at which the fuzzer discovers new behaviors. Intuitively, efficient fuzzers maximize information. From this information theoretic perspective, we develop ENTROPIC, an entropy-based power schedule for greybox fuzzing that assigns more energy to seeds that maximize information. We implemented ENTROPIC into the popular greybox fuzzer LIBFUZZER. Our experiments with more than 250 open-source programs (60 million LoC) demonstrate a substantially improved efficiency and confirm our hypothesis that an efficient fuzzer maximizes information. ENTROPIC has been independently evaluated and integrated into the main-line LIBFUZZER as the default power schedule. ENTROPIC now runs on more than 25,000 machines fuzzing hundreds of security-critical software systems simultaneously and continuously.","PeriodicalId":10594,"journal":{"name":"Communications of the ACM","volume":"37 7","pages":"0"},"PeriodicalIF":12.2000,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications of the ACM","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3611019","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 3

Abstract

In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, we know nothing about the behaviors of a program P : What does it do? Executing the first test input, we learn how P behaves for this input. Executing the next input, we either observe the same or discover a new behavior. As such, each execution reveals "some amount" of information about P 's behaviors. A classic measure of information is Shannon's entropy. Measuring entropy allows us to quantify how much is learned from each generated test input about the behaviors of the program. Within a probabilistic model of fuzzing, we show how entropy also measures fuzzer efficiency. Specifically, it measures the general rate at which the fuzzer discovers new behaviors. Intuitively, efficient fuzzers maximize information. From this information theoretic perspective, we develop ENTROPIC, an entropy-based power schedule for greybox fuzzing that assigns more energy to seeds that maximize information. We implemented ENTROPIC into the popular greybox fuzzer LIBFUZZER. Our experiments with more than 250 open-source programs (60 million LoC) demonstrate a substantially improved efficiency and confirm our hypothesis that an efficient fuzzer maximizes information. ENTROPIC has been independently evaluated and integrated into the main-line LIBFUZZER as the default power schedule. ENTROPIC now runs on more than 25,000 machines fuzzing hundreds of security-critical software systems simultaneously and continuously.

查看原文本刊更多论文

从信息论的角度提高模糊器的效率

在本文中，我们将模糊的基本观点看作是一个学习过程。假设在模糊测试之前，我们对程序的行为一无所知P:它做什么?执行第一个测试输入，我们了解P对这个输入的行为。执行下一个输入，我们要么观察到相同的行为，要么发现一个新的行为。因此，每次执行都揭示了关于P行为的“一些”信息。一个经典的信息度量是香农熵。度量熵允许我们量化从每个生成的关于程序行为的测试输入中学习到多少。在模糊的概率模型中，我们展示了熵是如何衡量模糊器效率的。具体来说，它测量模糊器发现新行为的一般速率。直觉上，高效的模糊者会最大化信息。从信息理论的角度来看，我们开发了ENTROPIC，这是一种基于熵的灰盒模糊调度，它将更多的能量分配给最大化信息的种子。我们在流行的灰盒模糊器LIBFUZZER中实现了ENTROPIC。我们对250多个开源程序(6000万LoC)的实验证明了效率的大幅提高，并证实了我们的假设，即高效的fuzzer可以最大化信息。ENTROPIC已被独立评估并集成到主线LIBFUZZER中，作为默认的电力计划。ENTROPIC现在在超过25,000台机器上运行，同时连续地对数百个安全关键软件系统进行模糊测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Communications of the ACM 工程技术-计算机：理论方法

CiteScore

16.10

自引率

0.40%

发文量

276

审稿时长

6-12 weeks

期刊介绍： Communications of the ACM is the leading print and online publication for the computing and information technology fields. Read by computing''s leading professionals worldwide, Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional. Following the traditions of the Communications print magazine, which each month brings its readership of over 100,000 ACM members in-depth coverage of emerging areas of computer science, new trends in information technology, and practical applications, the Communications website brings topical and informative news and material to computing professionals each business day. ACM''s membership includes the IT industry''s most respected leaders and decision makers. Industry leaders have for more than 50 years used the monthly Communications of the ACM magazine as a platform to present and debate various technology implications, public policies, engineering challenges, and market trends. The Communications website continues that practice.