Learning to Explore Paths for Symbolic Execution

Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security Pub Date : 2021-11-12 DOI:10.1145/3460120.3484813

Jingxuan He, Gishor Sivanrupan, Petar Tsankov, Martin T. Vechev

{"title":"Learning to Explore Paths for Symbolic Execution","authors":"Jingxuan He, Gishor Sivanrupan, Petar Tsankov, Martin T. Vechev","doi":"10.1145/3460120.3484813","DOIUrl":null,"url":null,"abstract":"Symbolic execution is a powerful technique that can generate tests steering program execution into desired paths. However, the scalability of symbolic execution is often limited by path explosion, i.e., the number of symbolic states representing the paths under exploration quickly explodes as execution goes on. Therefore, the effectiveness of symbolic execution engines hinges on the ability to select and explore the right symbolic states. In this work, we propose a novel learning-based strategy, called Learch, able to effectively select promising states for symbolic execution to tackle the path explosion problem. Learch directly estimates the contribution of each state towards the goal of maximizing coverage within a time budget, as opposed to relying on manually crafted heuristics based on simple statistics as a crude proxy for the objective. Moreover, Learch leverages existing heuristics in training data generation and feature extraction, and can thus benefit from any new expert-designed heuristics. We instantiated Learch in KLEE, a widely adopted symbolic execution engine. We evaluated Learch on a diverse set of programs, showing that Learch is practically effective: it covers more code and detects more security violations than existing manual heuristics, as well as combinations of those heuristics. We also show that using tests generated by Learch as initial fuzzing seeds enables the popular fuzzer AFL to find more paths and security violations.","PeriodicalId":135883,"journal":{"name":"Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460120.3484813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Symbolic execution is a powerful technique that can generate tests steering program execution into desired paths. However, the scalability of symbolic execution is often limited by path explosion, i.e., the number of symbolic states representing the paths under exploration quickly explodes as execution goes on. Therefore, the effectiveness of symbolic execution engines hinges on the ability to select and explore the right symbolic states. In this work, we propose a novel learning-based strategy, called Learch, able to effectively select promising states for symbolic execution to tackle the path explosion problem. Learch directly estimates the contribution of each state towards the goal of maximizing coverage within a time budget, as opposed to relying on manually crafted heuristics based on simple statistics as a crude proxy for the objective. Moreover, Learch leverages existing heuristics in training data generation and feature extraction, and can thus benefit from any new expert-designed heuristics. We instantiated Learch in KLEE, a widely adopted symbolic execution engine. We evaluated Learch on a diverse set of programs, showing that Learch is practically effective: it covers more code and detects more security violations than existing manual heuristics, as well as combinations of those heuristics. We also show that using tests generated by Learch as initial fuzzing seeds enables the popular fuzzer AFL to find more paths and security violations.

查看原文本刊更多论文

学习探索符号执行的路径

符号执行是一种强大的技术，它可以生成测试，将程序执行引导到所需的路径上。然而，符号执行的可扩展性通常受到路径爆炸的限制，即，随着执行的进行，表示正在探索的路径的符号状态的数量会迅速爆炸。因此，符号执行引擎的有效性取决于选择和探索正确符号状态的能力。在这项工作中，我们提出了一种新的基于学习的策略，称为Learch，能够有效地选择有希望的状态进行符号执行，以解决路径爆炸问题。Learch直接估计每个州对在时间预算内实现最大覆盖率目标的贡献，而不是依赖基于简单统计数据的手工制作的启发式作为目标的粗略代理。此外，Learch在训练数据生成和特征提取中利用了现有的启发式方法，因此可以从任何新的专家设计的启发式方法中受益。我们在广泛采用的符号执行引擎KLEE中实例化了Learch。我们在不同的程序集上评估了Learch，表明Learch实际上是有效的:它覆盖了更多的代码，并且比现有的手动启发式以及这些启发式的组合检测到更多的安全违规。我们还表明，使用Learch生成的测试作为初始模糊种子，可以使流行的模糊器AFL找到更多的路径和安全违规。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

自引率

0.00%

发文量