Bring Your Own Data Structures to Datalog

IF 2.2 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Arash Sahebolamri, Langston Barrett, Scott Moore, Kristopher Micinski
{"title":"Bring Your Own Data Structures to Datalog","authors":"Arash Sahebolamri, Langston Barrett, Scott Moore, Kristopher Micinski","doi":"10.1145/3622840","DOIUrl":null,"url":null,"abstract":"The restricted logic programming language Datalog has become a popular implementation target for deductive-analytic workloads including social-media analytics and program analysis. Modern Datalog engines compile Datalog rules to joins over explicit representations of relations—often B-trees or hash maps. While these modern engines have enabled high scalability in many application domains, they have a crucial weakness: achieving the desired algorithmic complexity may be impossible due to representation-imposed overhead of the engine’s data structures. In this paper, we present the \"Bring Your Own Data Structures\" (Byods) approach, in the form of a DSL embedded in Rust. Using Byods, an engineer writes logical rules which are implicitly parametric on the concrete data structure representation; our implementation provides an interface to enable \"bringing their own\" data structures to represent relations, which harmoniously interact with code generated by our compiler (implemented as Rust procedural macros). We formalize the semantics of Byods as an extension of Datalog’s; our formalization captures the key properties demanded of data structures compatible with Byods, including properties required for incrementalized (semi-naïve) evaluation. We detail many applications of the Byods approach, implementing analyses requiring specialized data structures for transitive and equivalence relations to scale, including an optimized version of the Rust borrow checker Polonius; highly-parallel PageRank made possible by lattices; and a large-scale analysis of LLVM utilizing index-sharing to scale. Our results show that Byods offers both improved algorithmic scalability (reduced time and/or space complexity) and runtimes competitive with state-of-the-art parallelizing Datalog solvers.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"1 1","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3622840","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 1

Abstract

The restricted logic programming language Datalog has become a popular implementation target for deductive-analytic workloads including social-media analytics and program analysis. Modern Datalog engines compile Datalog rules to joins over explicit representations of relations—often B-trees or hash maps. While these modern engines have enabled high scalability in many application domains, they have a crucial weakness: achieving the desired algorithmic complexity may be impossible due to representation-imposed overhead of the engine’s data structures. In this paper, we present the "Bring Your Own Data Structures" (Byods) approach, in the form of a DSL embedded in Rust. Using Byods, an engineer writes logical rules which are implicitly parametric on the concrete data structure representation; our implementation provides an interface to enable "bringing their own" data structures to represent relations, which harmoniously interact with code generated by our compiler (implemented as Rust procedural macros). We formalize the semantics of Byods as an extension of Datalog’s; our formalization captures the key properties demanded of data structures compatible with Byods, including properties required for incrementalized (semi-naïve) evaluation. We detail many applications of the Byods approach, implementing analyses requiring specialized data structures for transitive and equivalence relations to scale, including an optimized version of the Rust borrow checker Polonius; highly-parallel PageRank made possible by lattices; and a large-scale analysis of LLVM utilizing index-sharing to scale. Our results show that Byods offers both improved algorithmic scalability (reduced time and/or space complexity) and runtimes competitive with state-of-the-art parallelizing Datalog solvers.
把你自己的数据结构带到Datalog
受限制的逻辑编程语言Datalog已经成为演绎分析工作负载(包括社交媒体分析和程序分析)的流行实现目标。现代Datalog引擎将Datalog规则编译为关系的显式表示(通常是b树或哈希映射)上的连接。虽然这些现代引擎在许多应用领域中具有很高的可伸缩性,但它们有一个关键的弱点:由于引擎的数据结构的表示强加的开销,实现期望的算法复杂性可能是不可能的。在本文中,我们以嵌入在Rust中的DSL的形式提出了“自带数据结构”(Byods)方法。使用byod,工程师在具体的数据结构表示上编写隐含参数化的逻辑规则;我们的实现提供了一个接口,允许“自带”数据结构来表示关系,这些关系与我们的编译器生成的代码(作为Rust过程宏实现)和谐地交互。我们将byod的语义形式化,作为Datalog的扩展;我们的形式化捕获了与byod兼容的数据结构所需的关键属性,包括递增(semi-naïve)求值所需的属性。我们详细介绍了Byods方法的许多应用,实现了需要专门的数据结构来扩展传递和等价关系的分析,包括Rust借用检查器Polonius的优化版本;通过格实现高度并行的PageRank;以及利用索引共享进行扩展的LLVM大规模分析。我们的结果表明,Byods提供了改进的算法可伸缩性(减少了时间和/或空间复杂性)和运行时,与最先进的并行Datalog解决方案竞争。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages Engineering-Safety, Risk, Reliability and Quality
CiteScore
5.20
自引率
22.20%
发文量
192
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信