Experiences with Virtuoso Cluster RDF Column Store

P. Boncz, O. Erling, M. Pham
{"title":"Experiences with Virtuoso Cluster RDF Column Store","authors":"P. Boncz, O. Erling, M. Pham","doi":"10.1201/b16859-13","DOIUrl":null,"url":null,"abstract":"Virtuoso Column Store [185] introduces vectorized execution into the Virtuoso DBMS. Additionally, its scale-out version, that allows running the system on a cluster, has been significantly redesigned. This article discusses advances in scale-out support in Virtuoso and analyzes this on the Berlin SPARQL Benchmark (BSBM) [101]. To demonstrate the features of Virtuoso Cluster RDF Column Store, we first present micro-benchmarks on a small 2node cluster with 10 billion triples. In the full evaluation we show one can now scale-out to a BSBM database of 150 billion triples. The latter experiment is a 750 times increase over the previous largest BSBM report, and for the first time includes both its Explore and Business Intelligence workloads. The storage scheme used by Virtuoso for storing RDF Subject-PropertyObject triples pertaining to a Graph (hence we have quads, not triples) consists of five indexes: PSOG, POSG, SP, OP, GS. To be precise, PSOG is a B-tree with key (P,S,O,G), where P is a number identifying a property, S a subject, O an object and G the graph. Additionally, there is a B-tree holding URIs and a B-tree holding string literals, both of them used to encode string(-URI)s into numerical identifiers. Users may alter the indexing scheme of Virtuoso but this almost never happens. The three last indexes (SP, OP, GS) are projections of the first two covering indexes, containing only the unique combinations – hence these are much smaller. We note that Virtuoso Column Store Edition (V7) departs from the previous Virtuoso editions (V6) in that","PeriodicalId":252334,"journal":{"name":"Linked Data Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linked Data Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/b16859-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Virtuoso Column Store [185] introduces vectorized execution into the Virtuoso DBMS. Additionally, its scale-out version, that allows running the system on a cluster, has been significantly redesigned. This article discusses advances in scale-out support in Virtuoso and analyzes this on the Berlin SPARQL Benchmark (BSBM) [101]. To demonstrate the features of Virtuoso Cluster RDF Column Store, we first present micro-benchmarks on a small 2node cluster with 10 billion triples. In the full evaluation we show one can now scale-out to a BSBM database of 150 billion triples. The latter experiment is a 750 times increase over the previous largest BSBM report, and for the first time includes both its Explore and Business Intelligence workloads. The storage scheme used by Virtuoso for storing RDF Subject-PropertyObject triples pertaining to a Graph (hence we have quads, not triples) consists of five indexes: PSOG, POSG, SP, OP, GS. To be precise, PSOG is a B-tree with key (P,S,O,G), where P is a number identifying a property, S a subject, O an object and G the graph. Additionally, there is a B-tree holding URIs and a B-tree holding string literals, both of them used to encode string(-URI)s into numerical identifiers. Users may alter the indexing scheme of Virtuoso but this almost never happens. The three last indexes (SP, OP, GS) are projections of the first two covering indexes, containing only the unique combinations – hence these are much smaller. We note that Virtuoso Column Store Edition (V7) departs from the previous Virtuoso editions (V6) in that
使用Virtuoso集群RDF列存储的经验
Virtuoso Column Store[185]在Virtuoso DBMS中引入了矢量化执行。此外,它的横向扩展版本(允许在集群上运行系统)也进行了重大的重新设计。本文讨论了Virtuoso在横向扩展支持方面的进展,并在Berlin SPARQL Benchmark (BSBM)上进行了分析[101]。为了演示Virtuoso Cluster RDF Column Store的特性,我们首先在一个包含100亿个三元组的小型2节点集群上进行微基准测试。在完整的评估中,我们展示了现在可以扩展到一个包含1500亿个三元组的BSBM数据库。后一个实验比之前最大的BSBM报告增加了750倍,并且首次包含了其探索和商业智能工作负载。Virtuoso用于存储属于图的RDF Subject-PropertyObject三元组(因此我们有四元组,而不是三元组)的存储方案由五个索引组成:PSOG、POSG、SP、OP、GS。准确地说,PSOG是一个键为(P,S,O,G)的b树,其中P是标识属性的数字,S是主体,O是客体,G是图。此外,还有一个保存uri的b树和一个保存字符串字面值的b树,它们都用于将字符串(-URI)编码为数字标识符。用户可以改变Virtuoso的索引方案,但这几乎从未发生过。最后三个指数(SP、OP、GS)是前两个覆盖指数的投影,只包含唯一的组合——因此它们要小得多。我们注意到Virtuoso列存储版(V7)在这方面与以前的Virtuoso版本(V6)有所不同
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信