Flan: An Expressive and Efficient Datalog Compiler for Program Analysis

IF 2.2 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Supun Abeysinghe, Anxhelo Xhebraj, Tiark Rompf
{"title":"Flan: An Expressive and Efficient Datalog Compiler for Program Analysis","authors":"Supun Abeysinghe, Anxhelo Xhebraj, Tiark Rompf","doi":"10.1145/3632928","DOIUrl":null,"url":null,"abstract":"Datalog has gained prominence in program analysis due to its expressiveness and ease of use. Its generic fixpoint resolution algorithm over relational domains simplifies the expression of many complex analyses. The performance and scalability issues of early Datalog approaches have been addressed by tools such as Soufflé through specialized code generation. Still, while pure Datalog is expressive enough to support a wide range of analyses, there is a growing need for extensions to accommodate increasingly complex analyses. This has led to the development of various extensions, such as Flix, Datafun, and Formulog, which enhance Datalog with features like arbitrary lattices and SMT constraints. Most of these extensions recognize the need for full interoperability between Datalog and a full-fledged programming language, a functionality that high-performance systems like Soufflé lack. Specifically, in most cases, they construct languages from scratch with first-class Datalog support, allowing greater flexibility. However, this flexibility often comes at the cost of performance due to the conflicting requirements of prioritizing modularity and abstraction over efficiency. Consequently, achieving both flexibility and compilation to highly-performant specialized code poses a significant challenge. In this work, we reconcile the competing demands of expressiveness and performance with Flan, a Datalog compiler fully embedded in Scala that leverages multi-stage programming to generate specialized code for enhanced performance. Our approach combines the flexibility of Flix with Soufflé’s performance, offering seamless integration with the host language that enables the addition of powerful extensions while generating specialized code for the entire computation. Flan’s simple operator interface allows the addition of an extensive set of features, including arbitrary aggregates, user-defined functions, and lattices, with multiple execution strategies such as binary and multi-way joins, supported by different indexing structures like specialized trees and hash tables, with minimal effort. We evaluate our system on a variety of benchmarks and compare it to established Datalog engines. Our results demonstrate competitive performance and speedups in the range of 1.4× to 12.5× compared to state-of-the-art systems for workloads of practical importance.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"45 16","pages":"2577 - 2609"},"PeriodicalIF":2.2000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3632928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Datalog has gained prominence in program analysis due to its expressiveness and ease of use. Its generic fixpoint resolution algorithm over relational domains simplifies the expression of many complex analyses. The performance and scalability issues of early Datalog approaches have been addressed by tools such as Soufflé through specialized code generation. Still, while pure Datalog is expressive enough to support a wide range of analyses, there is a growing need for extensions to accommodate increasingly complex analyses. This has led to the development of various extensions, such as Flix, Datafun, and Formulog, which enhance Datalog with features like arbitrary lattices and SMT constraints. Most of these extensions recognize the need for full interoperability between Datalog and a full-fledged programming language, a functionality that high-performance systems like Soufflé lack. Specifically, in most cases, they construct languages from scratch with first-class Datalog support, allowing greater flexibility. However, this flexibility often comes at the cost of performance due to the conflicting requirements of prioritizing modularity and abstraction over efficiency. Consequently, achieving both flexibility and compilation to highly-performant specialized code poses a significant challenge. In this work, we reconcile the competing demands of expressiveness and performance with Flan, a Datalog compiler fully embedded in Scala that leverages multi-stage programming to generate specialized code for enhanced performance. Our approach combines the flexibility of Flix with Soufflé’s performance, offering seamless integration with the host language that enables the addition of powerful extensions while generating specialized code for the entire computation. Flan’s simple operator interface allows the addition of an extensive set of features, including arbitrary aggregates, user-defined functions, and lattices, with multiple execution strategies such as binary and multi-way joins, supported by different indexing structures like specialized trees and hash tables, with minimal effort. We evaluate our system on a variety of benchmarks and compare it to established Datalog engines. Our results demonstrate competitive performance and speedups in the range of 1.4× to 12.5× compared to state-of-the-art systems for workloads of practical importance.
Flan:用于程序分析的高效表达式 Datalog 编译器
Datalog 因其表现力和易用性而在程序分析中占据重要地位。它在关系域上的通用定点解析算法简化了许多复杂分析的表达。早期 Datalog 方法在性能和可扩展性方面的问题已通过专门的代码生成工具(如 Soufflé)得到解决。尽管纯 Datalog 的表现力足以支持各种分析,但人们对扩展的需求仍在不断增长,以适应日益复杂的分析。这导致了各种扩展的发展,如 Flix、Datafun 和 Formulog,它们通过任意网格和 SMT 约束等功能增强了 Datalog。这些扩展大多认识到了 Datalog 与成熟编程语言之间的全面互操作性需求,而这正是 Soufflé 等高性能系统所缺乏的功能。具体来说,在大多数情况下,它们从头开始构建具有一流 Datalog 支持的语言,从而实现更大的灵活性。然而,由于模块化和抽象性优先于效率的要求相互冲突,这种灵活性往往以牺牲性能为代价。因此,既要实现灵活性,又要编译出高性能的专用代码,是一个巨大的挑战。在这项工作中,我们用 Flan 调和了表达力和性能这两个相互竞争的需求,Flan 是一种完全嵌入 Scala 的 Datalog 编译器,它利用多阶段编程生成专用代码,从而提高性能。我们的方法结合了 Flix 的灵活性和 Soufflé 的性能,提供了与宿主语言的无缝集成,可以添加功能强大的扩展,同时为整个计算生成专用代码。Flan 的操作界面非常简单,只需极少的工作就能添加大量功能,包括任意聚合、用户自定义函数和网格,以及二进制和多向连接等多种执行策略,并由专门的树和哈希表等不同的索引结构提供支持。我们在各种基准测试中评估了我们的系统,并将其与成熟的 Datalog 引擎进行了比较。我们的结果表明,与最先进的系统相比,我们的系统在具有实际重要性的工作负载上具有竞争力的性能和 1.4 倍到 12.5 倍的速度提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages Engineering-Safety, Risk, Reliability and Quality
CiteScore
5.20
自引率
22.20%
发文量
192
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信