环:利用(几乎)无额外空间实现图数据库中的最坏情况最优连接

IF 2.2 2区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Diego Arroyuelo, Adrián Gómez-Brandón, Aidan Hogan, Gonzalo Navarro, Juan Reutter, Javiel Rojas-Ledesma, Adrián Soto
{"title":"环:利用(几乎)无额外空间实现图数据库中的最坏情况最优连接","authors":"Diego Arroyuelo, Adrián Gómez-Brandón, Aidan Hogan, Gonzalo Navarro, Juan Reutter, Javiel Rojas-Ledesma, Adrián Soto","doi":"10.1145/3644824","DOIUrl":null,"url":null,"abstract":"<p>We present an indexing scheme for triple-based graphs that supports join queries in worst-case optimal (wco) time within compact space. This scheme, called a <i>ring</i>, regards each triple as a cyclic string of length 3. Each rotation of the triples is lexicographically sorted and the values of the last attribute are stored as a column, so we obtain the order of the next column by stably re-sorting the triples by its attribute. We show that, by representing the columns with a compact data structure called a wavelet tree, this ordering enables forward and backward navigation between columns without needing pointers. These wavelet trees further support wco join algorithms and cardinality estimations for query planning. While traditional data structures such as B-Trees, tries, etc., require 6 index orders to support all possible wco joins over triples, we can use one ring to index them all. This ring replaces the graph and uses only sublinear extra space, thus supporting wco joins in almost no space beyond storing the graph itself. Experiments querying a large graph (Wikidata) in memory show that the ring offers nearly the best overall query times while using only a small fraction of the space required by several state-of-the-art approaches. </p><p>We then turn our attention to some theoretical results for indexing tables of arity <i>d</i> higher than 3 in such a way that supports wco joins. While a single ring of length <i>d</i> no longer suffices to cover all <i>d</i>! orders, we need much fewer rings to index them all: <i>O</i>(2<sup><i>d</i></sup>) rings with a small constant. For example, we need 5 rings instead of 120 orders for <i>d</i> = 5. We show that our rings become a particular case of what we dub <i>order graphs, whose nodes are attribute orders and where stably sorting by some attribute leads us from an order to another, thereby inducing an edge labeled by the attribute. The index is then the set of columns associated with the edges, and a set of rings is just one possible graph shape. We show that other shapes, like for example a single ring instead of several ones of length <i>d</i>, can lead us to even smaller indexes, and that other more general shapes are also possible. For example, we handle <i>d</i> = 5 attributes within space equivalent to 4 rings.</i></p>","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Ring: Worst-Case Optimal Joins in Graph Databases using (Almost) No Extra Space\",\"authors\":\"Diego Arroyuelo, Adrián Gómez-Brandón, Aidan Hogan, Gonzalo Navarro, Juan Reutter, Javiel Rojas-Ledesma, Adrián Soto\",\"doi\":\"10.1145/3644824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>We present an indexing scheme for triple-based graphs that supports join queries in worst-case optimal (wco) time within compact space. This scheme, called a <i>ring</i>, regards each triple as a cyclic string of length 3. Each rotation of the triples is lexicographically sorted and the values of the last attribute are stored as a column, so we obtain the order of the next column by stably re-sorting the triples by its attribute. We show that, by representing the columns with a compact data structure called a wavelet tree, this ordering enables forward and backward navigation between columns without needing pointers. These wavelet trees further support wco join algorithms and cardinality estimations for query planning. While traditional data structures such as B-Trees, tries, etc., require 6 index orders to support all possible wco joins over triples, we can use one ring to index them all. This ring replaces the graph and uses only sublinear extra space, thus supporting wco joins in almost no space beyond storing the graph itself. Experiments querying a large graph (Wikidata) in memory show that the ring offers nearly the best overall query times while using only a small fraction of the space required by several state-of-the-art approaches. </p><p>We then turn our attention to some theoretical results for indexing tables of arity <i>d</i> higher than 3 in such a way that supports wco joins. While a single ring of length <i>d</i> no longer suffices to cover all <i>d</i>! orders, we need much fewer rings to index them all: <i>O</i>(2<sup><i>d</i></sup>) rings with a small constant. For example, we need 5 rings instead of 120 orders for <i>d</i> = 5. We show that our rings become a particular case of what we dub <i>order graphs, whose nodes are attribute orders and where stably sorting by some attribute leads us from an order to another, thereby inducing an edge labeled by the attribute. The index is then the set of columns associated with the edges, and a set of rings is just one possible graph shape. We show that other shapes, like for example a single ring instead of several ones of length <i>d</i>, can lead us to even smaller indexes, and that other more general shapes are also possible. For example, we handle <i>d</i> = 5 attributes within space equivalent to 4 rings.</i></p>\",\"PeriodicalId\":50915,\"journal\":{\"name\":\"ACM Transactions on Database Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Database Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3644824\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3644824","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

我们为基于三元组的图提出了一种索引方案,它支持在紧凑空间内以最坏情况最优(wco)时间进行连接查询。该方案被称为环,将每个三元组视为长度为 3 的循环字符串。三元组的每次旋转都按词典排序,最后一个属性的值被存储为一列,因此我们可以通过按属性对三元组进行稳定的重新排序来获得下一列的顺序。我们的研究表明,通过使用一种称为小波树的紧凑型数据结构来表示列,这种排序方式无需指针就能实现列之间的前后导航。这些小波树能进一步支持 wco 连接算法和用于查询规划的卡入度估计。传统的数据结构(如 B 树、tries 等)需要 6 个索引顺序才能支持三元组上所有可能的 wco 连接,而我们可以使用一个环来索引所有三元组。这个环取代了图,只使用了亚线性的额外空间,因此除了存储图本身之外,几乎不需要任何空间就能支持 wco 连接。在内存中查询大型图(维基数据)的实验表明,该环几乎提供了最佳的整体查询时间,而所需空间仅为几种最先进方法的一小部分。接下来,我们将注意力转移到以支持 wco 连接的方式为 arity d 大于 3 的表编制索引的一些理论结果上。虽然长度为 d 的单个环已不足以覆盖所有 d 的阶次,但我们需要更少的环来索引所有阶次:O(2d)个小常数的环。例如,当 d = 5 时,我们需要 5 个环,而不是 120 个顺序。我们将证明,我们的环将成为我们所称的阶次图的一种特殊情况,阶次图的节点是属性阶次,在阶次图中,通过对某些属性进行稳定排序,我们将从一个阶次到达另一个阶次,从而产生一条由属性标记的边。索引就是与边相关联的列集,而环集只是一种可能的图形。我们将证明,其他形状,例如长度为 d 的单个环而不是多个环,可以让我们得到更小的索引,而且其他更一般的形状也是可能的。例如,我们在相当于 4 个环的空间内处理了 d = 5 个属性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Ring: Worst-Case Optimal Joins in Graph Databases using (Almost) No Extra Space

We present an indexing scheme for triple-based graphs that supports join queries in worst-case optimal (wco) time within compact space. This scheme, called a ring, regards each triple as a cyclic string of length 3. Each rotation of the triples is lexicographically sorted and the values of the last attribute are stored as a column, so we obtain the order of the next column by stably re-sorting the triples by its attribute. We show that, by representing the columns with a compact data structure called a wavelet tree, this ordering enables forward and backward navigation between columns without needing pointers. These wavelet trees further support wco join algorithms and cardinality estimations for query planning. While traditional data structures such as B-Trees, tries, etc., require 6 index orders to support all possible wco joins over triples, we can use one ring to index them all. This ring replaces the graph and uses only sublinear extra space, thus supporting wco joins in almost no space beyond storing the graph itself. Experiments querying a large graph (Wikidata) in memory show that the ring offers nearly the best overall query times while using only a small fraction of the space required by several state-of-the-art approaches.

We then turn our attention to some theoretical results for indexing tables of arity d higher than 3 in such a way that supports wco joins. While a single ring of length d no longer suffices to cover all d! orders, we need much fewer rings to index them all: O(2d) rings with a small constant. For example, we need 5 rings instead of 120 orders for d = 5. We show that our rings become a particular case of what we dub order graphs, whose nodes are attribute orders and where stably sorting by some attribute leads us from an order to another, thereby inducing an edge labeled by the attribute. The index is then the set of columns associated with the edges, and a set of rings is just one possible graph shape. We show that other shapes, like for example a single ring instead of several ones of length d, can lead us to even smaller indexes, and that other more general shapes are also possible. For example, we handle d = 5 attributes within space equivalent to 4 rings.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Database Systems
ACM Transactions on Database Systems 工程技术-计算机:软件工程
CiteScore
5.60
自引率
0.00%
发文量
15
审稿时长
>12 weeks
期刊介绍: Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信