On the Feasibility of Automated Built-in Function Modeling for PHP Symbolic Execution

Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442381.3450002

Penghui Li, W. Meng, Kangjie Lu, Changhua Luo

{"title":"On the Feasibility of Automated Built-in Function Modeling for PHP Symbolic Execution","authors":"Penghui Li, W. Meng, Kangjie Lu, Changhua Luo","doi":"10.1145/3442381.3450002","DOIUrl":null,"url":null,"abstract":"Symbolic execution has been widely applied in detecting vulnerabilities in web applications. Modeling language-specific built-in functions is essential for symbolic execution. Since built-in functions tend to be complicated and are typically implemented in low-level languages, a common strategy is to manually translate them into the SMT-LIB language for constraint solving. Such translation requires an excessive amount of human effort and deep understandings of the function behaviors. Incorrect translation can invalidate the final results. This problem aggravates in PHP applications because of their cross-language nature, i.e., , the built-in functions are written in C, but the rest code is in PHP. In this paper, we explore the feasibility of automating the process of modeling PHP built-in functions for symbolic execution. We synthesize C programs by transforming the constraint solving task in PHP symbolic execution into a C-compliant format and integrating them with C implementations of the built-in functions. We apply symbolic execution on the synthesized C program to find a feasible path, which gives a solution that can be applied to the original PHP constraints. In this way, we automate the modeling of built-in functions in PHP applications. We thoroughly compare our automated method with the state-of-the-art manual modeling tool. The evaluation results demonstrate that our automated method is more accurate with a higher function coverage, and can exploit a similar number of vulnerabilities. Our empirical analysis also shows that the manual and automated methods have different strengths, which complement each other in certain scenarios. Therefore, the best practice is to combine both of them to optimize the accuracy, correctness, and coverage of symbolic execution.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3450002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Symbolic execution has been widely applied in detecting vulnerabilities in web applications. Modeling language-specific built-in functions is essential for symbolic execution. Since built-in functions tend to be complicated and are typically implemented in low-level languages, a common strategy is to manually translate them into the SMT-LIB language for constraint solving. Such translation requires an excessive amount of human effort and deep understandings of the function behaviors. Incorrect translation can invalidate the final results. This problem aggravates in PHP applications because of their cross-language nature, i.e., , the built-in functions are written in C, but the rest code is in PHP. In this paper, we explore the feasibility of automating the process of modeling PHP built-in functions for symbolic execution. We synthesize C programs by transforming the constraint solving task in PHP symbolic execution into a C-compliant format and integrating them with C implementations of the built-in functions. We apply symbolic execution on the synthesized C program to find a feasible path, which gives a solution that can be applied to the original PHP constraints. In this way, we automate the modeling of built-in functions in PHP applications. We thoroughly compare our automated method with the state-of-the-art manual modeling tool. The evaluation results demonstrate that our automated method is more accurate with a higher function coverage, and can exploit a similar number of vulnerabilities. Our empirical analysis also shows that the manual and automated methods have different strengths, which complement each other in certain scenarios. Therefore, the best practice is to combine both of them to optimize the accuracy, correctness, and coverage of symbolic execution.

查看原文本刊更多论文

PHP符号执行自动内置函数建模的可行性研究

符号执行在web应用程序漏洞检测中得到了广泛的应用。对特定于语言的内置函数进行建模对于符号执行至关重要。由于内置函数往往很复杂，而且通常是用低级语言实现的，因此常用的策略是手动将它们翻译成SMT-LIB语言，以便求解约束。这样的翻译需要大量的人力和对功能行为的深刻理解。不正确的翻译会使最终结果无效。由于PHP应用程序的跨语言特性，这个问题在PHP应用程序中更加严重，例如，内置函数是用C编写的，但其余代码是用PHP编写的。在本文中，我们探讨了自动化PHP内置函数建模过程的可行性。我们通过将PHP符号执行中的约束求解任务转换为符合C的格式，并将其与内置函数的C实现集成来合成C程序。我们对合成的C程序进行符号执行，寻找可行的路径，给出了一个可以应用于原PHP约束的解决方案。通过这种方式，我们自动化了PHP应用程序中内置函数的建模。我们将我们的自动化方法与最先进的手动建模工具进行了彻底的比较。评估结果表明，我们的自动化方法更准确，具有更高的功能覆盖率，并且可以利用相似数量的漏洞。我们的实证分析也表明，手动方法和自动化方法具有不同的优势，在某些情况下可以相互补充。因此，最佳实践是将它们结合起来，以优化符号执行的准确性、正确性和覆盖范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Web Conference 2021

自引率

0.00%

发文量