Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)

arXiv - CS - Programming Languages Pub Date : 2024-08-26 DOI:arxiv-2408.14017

Aaron BembenekUniversity of Melbourne, Michael GreenbergStevens Institute of Technology, Stephen ChongHarvard University

{"title":"Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)","authors":"Aaron BembenekUniversity of Melbourne, Michael GreenbergStevens Institute of Technology, Stephen ChongHarvard University","doi":"arxiv-2408.14017","DOIUrl":null,"url":null,"abstract":"By combining Datalog, SMT solving, and functional programming, the language\nFormulog provides an appealing mix of features for implementing SMT-based\nstatic analyses (e.g., refinement type checking, symbolic execution) in a\nnatural, declarative way. At the same time, the performance of its custom\nDatalog solver can be an impediment to using Formulog beyond prototyping -- a\ncommon problem for Datalog variants that aspire to solve large problem\ninstances. In this work we speed up Formulog evaluation, with surprising\nresults: while 2.2x speedups are obtained by using the conventional techniques\nfor high-performance Datalog (e.g., compilation, specialized data structures),\nthe big wins come by abandoning the central assumption in modern performant\nDatalog engines, semi-naive Datalog evaluation. In its place, we develop eager\nevaluation, a concurrent Datalog evaluation algorithm that explores the logical\ninference space via a depth-first traversal order. In practice, eager\nevaluation leads to an advantageous distribution of Formulog's SMT workload to\nexternal SMT solvers and improved SMT solving times: our eager evaluation\nextensions to the Formulog interpreter and Souffl\\'e's code generator achieve\nmean 5.2x and 7.6x speedups, respectively, over the optimized code generated by\noff-the-shelf Souffl\\'e on SMT-heavy Formulog benchmarks. Using compilation and eager evaluation, Formulog implementations of\nrefinement type checking, bottom-up pointer analysis, and symbolic execution\nachieve speedups on 20 out of 23 benchmarks over previously published,\nhand-tuned analyses written in F#, Java, and C++, providing strong evidence\nthat Formulog can be the basis of a realistic platform for SMT-based static\nanalysis. Moreover, our experience adds nuance to the conventional wisdom that\nsemi-naive evaluation is the one-size-fits-all best Datalog evaluation\nalgorithm for static analysis workloads.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

By combining Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping -- a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with surprising results: while 2.2x speedups are obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In its place, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog's SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Souffl\'e's code generator achieve mean 5.2x and 7.6x speedups, respectively, over the optimized code generated by off-the-shelf Souffl\'e on SMT-heavy Formulog benchmarks. Using compilation and eager evaluation, Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F#, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.

查看原文本刊更多论文

让 Formulog 更快：非常规数据模型评估论证（扩展版）

通过将 Datalog、SMT 求解和函数式编程相结合，Formulog 语言为以自然、声明的方式实现基于 SMT 的静态分析（如细化类型检查、符号执行）提供了极具吸引力的功能组合。与此同时，其自定义 Datalog 求解器的性能可能会阻碍 Formulog 在原型设计之外的使用--这对于希望解决大型问题实例的 Datalog 变体来说是一个常见问题。在这项工作中，我们加快了 Formulog 的评估速度，结果令人吃惊：虽然使用高性能 Datalog 的传统技术（如编译、专用数据结构）可以获得 2.2 倍的速度提升，但放弃现代高性能 Datalog 引擎的核心假设--半零 Datalog 评估--则会带来巨大的收益。取而代之的是我们开发的 eagerevaluation，一种通过深度优先遍历顺序探索逻辑推理空间的并发 Datalog 评估算法。在实践中，eagerevaluation 能将 Formulog 的 SMT 工作负载有利地分配给外部 SMT 求解器，并提高 SMT 求解时间：在重 SMT 的 Formulog 基准上，我们对 Formulog 解释器和 Souffl\'e 代码生成器的急切评估扩展，比现成的 Souffl\'e 生成的优化代码分别提高了 5.2 倍和 7.6 倍的速度。通过编译和急迫评估，Formulog 实现了细化类型检查、自下而上的指针分析和符号执行，在 23 个基准中的 20 个基准上的速度超过了以前发布的、用 F#、Java 和 C++ 编写的手工调整分析，有力地证明了 Formulog 可以成为基于 SMT 的静态分析的现实平台的基础。此外，我们的经验为传统观点增添了微妙的变化，即对于静态分析工作负载而言，无损评估是放之四海而皆准的最佳 Datalog 评估算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Programming Languages

自引率

0.00%

发文量