Regression Greybox Fuzzing

Xiaogang Zhu, Marcel Böhme
{"title":"Regression Greybox Fuzzing","authors":"Xiaogang Zhu, Marcel Böhme","doi":"10.1145/3460120.3484596","DOIUrl":null,"url":null,"abstract":"What you change is what you fuzz! In an empirical study of all fuzzer-generated bug reports in OSSFuzz, we found that four in every five bugs have been introduced by recent code changes. That is, 77% of 23k bugs are regressions. For a newly added project, there is usually an initial burst of new reports at 2-3 bugs per day. However, after that initial burst, and after weeding out most of the existing bugs, we still get a constant rate of 3-4 bug reports per week. The constant rate can only be explained by an increasing regression rate. Indeed, the probability that a reported bug is a regression (i.e., we could identify the bug-introducing commit) increases from 20% for the first bug to 92% after a few hundred bug reports. In this paper, we introduce regression greybox fuzzing (RGF) a fuzzing approach that focuses on code that has changed more recently or more often. However, for any active software project, it is impractical to fuzz sufficiently each code commit individually. Instead, we propose to fuzz all commits simultaneously, but code present in more (recent) commits with higher priority. We observe that most code is never changed and relatively old. So, we identify means to strengthen the signal from executed code-of-interest. We also extend the concept of power schedules to the bytes of a seed and introduce Ant Colony Optimization to assign more energy to those bytes which promise to generate more interesting inputs. Our large-scale fuzzing experiment demonstrates the validity of our main hypothesis and the efficiency of regression greybox fuzzing. We conducted our experiments in a reproducible manner within Fuzzbench, an extensible fuzzer evaluation platform. Our experiments involved 3+ CPU-years worth of fuzzing campaigns and 20 bugs in 15 open-source C programs available on OSSFuzz.","PeriodicalId":135883,"journal":{"name":"Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460120.3484596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

Abstract

What you change is what you fuzz! In an empirical study of all fuzzer-generated bug reports in OSSFuzz, we found that four in every five bugs have been introduced by recent code changes. That is, 77% of 23k bugs are regressions. For a newly added project, there is usually an initial burst of new reports at 2-3 bugs per day. However, after that initial burst, and after weeding out most of the existing bugs, we still get a constant rate of 3-4 bug reports per week. The constant rate can only be explained by an increasing regression rate. Indeed, the probability that a reported bug is a regression (i.e., we could identify the bug-introducing commit) increases from 20% for the first bug to 92% after a few hundred bug reports. In this paper, we introduce regression greybox fuzzing (RGF) a fuzzing approach that focuses on code that has changed more recently or more often. However, for any active software project, it is impractical to fuzz sufficiently each code commit individually. Instead, we propose to fuzz all commits simultaneously, but code present in more (recent) commits with higher priority. We observe that most code is never changed and relatively old. So, we identify means to strengthen the signal from executed code-of-interest. We also extend the concept of power schedules to the bytes of a seed and introduce Ant Colony Optimization to assign more energy to those bytes which promise to generate more interesting inputs. Our large-scale fuzzing experiment demonstrates the validity of our main hypothesis and the efficiency of regression greybox fuzzing. We conducted our experiments in a reproducible manner within Fuzzbench, an extensible fuzzer evaluation platform. Our experiments involved 3+ CPU-years worth of fuzzing campaigns and 20 bugs in 15 open-source C programs available on OSSFuzz.
回归灰盒模糊
你所改变的就是你所模糊的!在对OSSFuzz中所有fuzzer生成的bug报告的实证研究中,我们发现每五个bug中就有四个是由最近的代码更改引入的。也就是说,23k个bug中有77%是回归。对于一个新添加的项目,通常每天会有2-3个bug的新报告。然而,在最初的爆发之后,在清除了大多数现有的bug之后,我们仍然会得到每周3-4个bug报告的恒定速率。不变的速率只能用不断增加的回归速率来解释。事实上,报告的bug是回归的概率(例如,我们可以识别引入bug的提交)从第一个bug的20%增加到几百个bug报告后的92%。在本文中,我们介绍了回归灰盒模糊测试(RGF),这是一种专注于最近或更频繁更改的代码的模糊测试方法。然而,对于任何活动的软件项目,对每个单独提交的代码进行充分模糊处理是不切实际的。相反,我们建议同时模糊所有提交,但是在更(最近)的提交中出现的代码具有更高的优先级。我们观察到,大多数代码从未更改过,而且相对较旧。因此,我们确定了从执行的兴趣代码中增强信号的方法。我们还将功率调度的概念扩展到种子的字节,并引入蚁群优化,将更多的能量分配给那些有望产生更多有趣输入的字节。我们的大规模模糊实验证明了我们主要假设的有效性和回归灰盒模糊的有效性。我们在Fuzzbench(一个可扩展的模糊器评估平台)中以可重复的方式进行了实验。我们的实验涉及3年多cpu的模糊测试活动和OSSFuzz上15个开源C程序中的20个bug。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信