用符号符号化执行与语法无关的符号

Weiyu Pan, Zhenbang Chen, Guofeng Zhang, Yunlai Luo, Yufeng Zhang, Ji Wang
{"title":"用符号符号化执行与语法无关的符号","authors":"Weiyu Pan, Zhenbang Chen, Guofeng Zhang, Yunlai Luo, Yufeng Zhang, Ji Wang","doi":"10.1145/3460319.3464845","DOIUrl":null,"url":null,"abstract":"Parsing code exists extensively in software. Symbolic execution of complex parsing programs is challenging. The inputs generated by the symbolic execution using the byte-level symbolization are usually rejected by the parsing program, which dooms the effectiveness and efficiency of symbolic execution. Complex parsing programs usually adopt token-based input grammar checking. A token sequence represents one case of the input grammar. Based on this observation, we propose grammar-agnostic symbolic execution that can automatically generate token sequences to test complex parsing programs effectively and efficiently. Our method's key idea is to symbolize tokens instead of input bytes to improve the efficiency of symbolic execution. Technically, we propose a novel two-stage algorithm: the first stage collects the byte-level constraints of token values; the second stage employs token symbolization and the constraints collected in the first stage to generate the program inputs that are more possible to pass the parsing code. We have implemented our method on a Java Pathfinder (JPF) based concolic execution engine. The results of the extensive experiments on real-world Java parsing programs demonstrate the effectiveness and efficiency in testing complex parsing programs. Our method detects 6 unknown bugs in the benchmark programs and achieves orders of magnitude speedup to find the same bugs.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Grammar-agnostic symbolic execution by token symbolization\",\"authors\":\"Weiyu Pan, Zhenbang Chen, Guofeng Zhang, Yunlai Luo, Yufeng Zhang, Ji Wang\",\"doi\":\"10.1145/3460319.3464845\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parsing code exists extensively in software. Symbolic execution of complex parsing programs is challenging. The inputs generated by the symbolic execution using the byte-level symbolization are usually rejected by the parsing program, which dooms the effectiveness and efficiency of symbolic execution. Complex parsing programs usually adopt token-based input grammar checking. A token sequence represents one case of the input grammar. Based on this observation, we propose grammar-agnostic symbolic execution that can automatically generate token sequences to test complex parsing programs effectively and efficiently. Our method's key idea is to symbolize tokens instead of input bytes to improve the efficiency of symbolic execution. Technically, we propose a novel two-stage algorithm: the first stage collects the byte-level constraints of token values; the second stage employs token symbolization and the constraints collected in the first stage to generate the program inputs that are more possible to pass the parsing code. We have implemented our method on a Java Pathfinder (JPF) based concolic execution engine. The results of the extensive experiments on real-world Java parsing programs demonstrate the effectiveness and efficiency in testing complex parsing programs. Our method detects 6 unknown bugs in the benchmark programs and achieves orders of magnitude speedup to find the same bugs.\",\"PeriodicalId\":188008,\"journal\":{\"name\":\"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3460319.3464845\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460319.3464845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

解析代码在软件中广泛存在。复杂解析程序的符号执行具有挑战性。使用字节级符号化的符号执行生成的输入通常会被解析程序拒绝,这就影响了符号执行的有效性和效率。复杂的解析程序通常采用基于符号的输入语法检查。记号序列表示输入语法的一种情况。基于这种观察,我们提出了与语法无关的符号执行,它可以自动生成标记序列,以有效地测试复杂的解析程序。我们的方法的关键思想是用符号符号代替输入字节来提高符号执行的效率。在技术上,我们提出了一种新的两阶段算法:第一阶段收集令牌值的字节级约束;第二阶段使用令牌符号化和第一阶段收集的约束来生成更有可能传递解析代码的程序输入。我们已经在基于Java Pathfinder (JPF)的聚合执行引擎上实现了我们的方法。在实际的Java解析程序上进行了大量的实验,结果证明了测试复杂解析程序的有效性和效率。我们的方法在基准程序中检测到6个未知的bug,并实现了数量级的加速来发现相同的bug。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Grammar-agnostic symbolic execution by token symbolization
Parsing code exists extensively in software. Symbolic execution of complex parsing programs is challenging. The inputs generated by the symbolic execution using the byte-level symbolization are usually rejected by the parsing program, which dooms the effectiveness and efficiency of symbolic execution. Complex parsing programs usually adopt token-based input grammar checking. A token sequence represents one case of the input grammar. Based on this observation, we propose grammar-agnostic symbolic execution that can automatically generate token sequences to test complex parsing programs effectively and efficiently. Our method's key idea is to symbolize tokens instead of input bytes to improve the efficiency of symbolic execution. Technically, we propose a novel two-stage algorithm: the first stage collects the byte-level constraints of token values; the second stage employs token symbolization and the constraints collected in the first stage to generate the program inputs that are more possible to pass the parsing code. We have implemented our method on a Java Pathfinder (JPF) based concolic execution engine. The results of the extensive experiments on real-world Java parsing programs demonstrate the effectiveness and efficiency in testing complex parsing programs. Our method detects 6 unknown bugs in the benchmark programs and achieves orders of magnitude speedup to find the same bugs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信