覆盖引导学习辅助语法模糊测试

2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW) Pub Date : 2019-04-01 DOI:10.1109/ICSTW.2019.00065

Yuma Jitsunari, Yoshitaka Arahori

{"title":"覆盖引导学习辅助语法模糊测试","authors":"Yuma Jitsunari, Yoshitaka Arahori","doi":"10.1109/ICSTW.2019.00065","DOIUrl":null,"url":null,"abstract":"Grammar-based fuzzing is known to be an effective technique for checking security vulnerabilities in programs, such as parsers, which take complex structured inputs. Unfortunately, most of existing grammar-based fuzzers require a lot of manual efforts of writing complex input grammars, which hinders their practical use. To address this problem, recently proposed approaches use machine learning to automatically acquire a generative model for structured inputs conforming to a complex grammar. Even such approaches, however, have major limitations: they fail to learn a generative model for instruction sequences, and they cannot achieve good coverage of instruction-parsing code. To overcome such limitations. this paper proposes a collection of techniques for enhancing learning-assisited grammar-based fuzzing. Our approach allows for the learning of a generative model for instruction sequences by training a hybrid character/token-level recursive neural network. In addition, we exploit coverage metrics gathered during previous runs of fuzzing in order to efficiently refine (or fine-tune) the learnt model so that it can make high coverage-inducing new inputs. Our experiments with a real PDF parser show that our approach succeeded in generating new sequences of instructions (in PDF page streams) that induce better code coverage (of the PDF parser) than state-of-the-art learning-assisted grammar-based fuzzers.","PeriodicalId":310230,"journal":{"name":"2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Coverage-Guided Learning-Assisted Grammar-Based Fuzzing\",\"authors\":\"Yuma Jitsunari, Yoshitaka Arahori\",\"doi\":\"10.1109/ICSTW.2019.00065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Grammar-based fuzzing is known to be an effective technique for checking security vulnerabilities in programs, such as parsers, which take complex structured inputs. Unfortunately, most of existing grammar-based fuzzers require a lot of manual efforts of writing complex input grammars, which hinders their practical use. To address this problem, recently proposed approaches use machine learning to automatically acquire a generative model for structured inputs conforming to a complex grammar. Even such approaches, however, have major limitations: they fail to learn a generative model for instruction sequences, and they cannot achieve good coverage of instruction-parsing code. To overcome such limitations. this paper proposes a collection of techniques for enhancing learning-assisited grammar-based fuzzing. Our approach allows for the learning of a generative model for instruction sequences by training a hybrid character/token-level recursive neural network. In addition, we exploit coverage metrics gathered during previous runs of fuzzing in order to efficiently refine (or fine-tune) the learnt model so that it can make high coverage-inducing new inputs. Our experiments with a real PDF parser show that our approach succeeded in generating new sequences of instructions (in PDF page streams) that induce better code coverage (of the PDF parser) than state-of-the-art learning-assisted grammar-based fuzzers.\",\"PeriodicalId\":310230,\"journal\":{\"name\":\"2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSTW.2019.00065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSTW.2019.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

众所周知，基于语法的模糊测试是检查程序(如接受复杂结构化输入的解析器)中的安全漏洞的有效技术。不幸的是，大多数现有的基于语法的fuzzers需要大量手工编写复杂的输入语法，这阻碍了它们的实际使用。为了解决这个问题，最近提出的方法使用机器学习来自动获取符合复杂语法的结构化输入的生成模型。然而，即使这样的方法也有很大的局限性:它们无法学习指令序列的生成模型，并且不能很好地覆盖指令解析代码。克服这些限制本文提出了一套增强学习辅助语法模糊的技术。我们的方法允许通过训练混合字符/标记级递归神经网络来学习指令序列的生成模型。此外，我们利用在以前的模糊测试运行期间收集的覆盖度量，以便有效地改进(或微调)学习的模型，以便它可以做出高覆盖诱导的新输入。我们对真正的PDF解析器的实验表明，我们的方法成功地生成了新的指令序列(在PDF页面流中)，与最先进的学习辅助的基于语法的模糊器相比，这些指令序列(PDF解析器的代码覆盖率)更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Coverage-Guided Learning-Assisted Grammar-Based Fuzzing

Grammar-based fuzzing is known to be an effective technique for checking security vulnerabilities in programs, such as parsers, which take complex structured inputs. Unfortunately, most of existing grammar-based fuzzers require a lot of manual efforts of writing complex input grammars, which hinders their practical use. To address this problem, recently proposed approaches use machine learning to automatically acquire a generative model for structured inputs conforming to a complex grammar. Even such approaches, however, have major limitations: they fail to learn a generative model for instruction sequences, and they cannot achieve good coverage of instruction-parsing code. To overcome such limitations. this paper proposes a collection of techniques for enhancing learning-assisited grammar-based fuzzing. Our approach allows for the learning of a generative model for instruction sequences by training a hybrid character/token-level recursive neural network. In addition, we exploit coverage metrics gathered during previous runs of fuzzing in order to efficiently refine (or fine-tune) the learnt model so that it can make high coverage-inducing new inputs. Our experiments with a real PDF parser show that our approach succeeded in generating new sequences of instructions (in PDF page streams) that induce better code coverage (of the PDF parser) than state-of-the-art learning-assisted grammar-based fuzzers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

自引率

0.00%

发文量