Experiences with Virtuoso Cluster RDF Column Store

Linked Data Management Pub Date : 1900-01-01 DOI:10.1201/b16859-13

P. Boncz, O. Erling, M. Pham

{"title":"Experiences with Virtuoso Cluster RDF Column Store","authors":"P. Boncz, O. Erling, M. Pham","doi":"10.1201/b16859-13","DOIUrl":null,"url":null,"abstract":"Virtuoso Column Store [185] introduces vectorized execution into the Virtuoso DBMS. Additionally, its scale-out version, that allows running the system on a cluster, has been significantly redesigned. This article discusses advances in scale-out support in Virtuoso and analyzes this on the Berlin SPARQL Benchmark (BSBM) [101]. To demonstrate the features of Virtuoso Cluster RDF Column Store, we first present micro-benchmarks on a small 2node cluster with 10 billion triples. In the full evaluation we show one can now scale-out to a BSBM database of 150 billion triples. The latter experiment is a 750 times increase over the previous largest BSBM report, and for the first time includes both its Explore and Business Intelligence workloads. The storage scheme used by Virtuoso for storing RDF Subject-PropertyObject triples pertaining to a Graph (hence we have quads, not triples) consists of five indexes: PSOG, POSG, SP, OP, GS. To be precise, PSOG is a B-tree with key (P,S,O,G), where P is a number identifying a property, S a subject, O an object and G the graph. Additionally, there is a B-tree holding URIs and a B-tree holding string literals, both of them used to encode string(-URI)s into numerical identifiers. Users may alter the indexing scheme of Virtuoso but this almost never happens. The three last indexes (SP, OP, GS) are projections of the first two covering indexes, containing only the unique combinations – hence these are much smaller. We note that Virtuoso Column Store Edition (V7) departs from the previous Virtuoso editions (V6) in that","PeriodicalId":252334,"journal":{"name":"Linked Data Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linked Data Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/b16859-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Virtuoso Column Store [185] introduces vectorized execution into the Virtuoso DBMS. Additionally, its scale-out version, that allows running the system on a cluster, has been significantly redesigned. This article discusses advances in scale-out support in Virtuoso and analyzes this on the Berlin SPARQL Benchmark (BSBM) [101]. To demonstrate the features of Virtuoso Cluster RDF Column Store, we first present micro-benchmarks on a small 2node cluster with 10 billion triples. In the full evaluation we show one can now scale-out to a BSBM database of 150 billion triples. The latter experiment is a 750 times increase over the previous largest BSBM report, and for the first time includes both its Explore and Business Intelligence workloads. The storage scheme used by Virtuoso for storing RDF Subject-PropertyObject triples pertaining to a Graph (hence we have quads, not triples) consists of five indexes: PSOG, POSG, SP, OP, GS. To be precise, PSOG is a B-tree with key (P,S,O,G), where P is a number identifying a property, S a subject, O an object and G the graph. Additionally, there is a B-tree holding URIs and a B-tree holding string literals, both of them used to encode string(-URI)s into numerical identifiers. Users may alter the indexing scheme of Virtuoso but this almost never happens. The three last indexes (SP, OP, GS) are projections of the first two covering indexes, containing only the unique combinations – hence these are much smaller. We note that Virtuoso Column Store Edition (V7) departs from the previous Virtuoso editions (V6) in that

查看原文本刊更多论文

使用Virtuoso集群RDF列存储的经验

Virtuoso Column Store[185]在Virtuoso DBMS中引入了矢量化执行。此外，它的横向扩展版本(允许在集群上运行系统)也进行了重大的重新设计。本文讨论了Virtuoso在横向扩展支持方面的进展，并在Berlin SPARQL Benchmark (BSBM)上进行了分析[101]。为了演示Virtuoso Cluster RDF Column Store的特性，我们首先在一个包含100亿个三元组的小型2节点集群上进行微基准测试。在完整的评估中，我们展示了现在可以扩展到一个包含1500亿个三元组的BSBM数据库。后一个实验比之前最大的BSBM报告增加了750倍，并且首次包含了其探索和商业智能工作负载。Virtuoso用于存储属于图的RDF Subject-PropertyObject三元组(因此我们有四元组，而不是三元组)的存储方案由五个索引组成:PSOG、POSG、SP、OP、GS。准确地说，PSOG是一个键为(P,S,O,G)的b树，其中P是标识属性的数字，S是主体，O是客体，G是图。此外，还有一个保存uri的b树和一个保存字符串字面值的b树，它们都用于将字符串(-URI)编码为数字标识符。用户可以改变Virtuoso的索引方案，但这几乎从未发生过。最后三个指数(SP、OP、GS)是前两个覆盖指数的投影，只包含唯一的组合——因此它们要小得多。我们注意到Virtuoso列存储版(V7)在这方面与以前的Virtuoso版本(V6)有所不同

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Linked Data Management

自引率

0.00%

发文量