Minimum motif-cut: a workload-aware RDF graph partitioning strategy

Peng Peng, Shengyi Ji, M. Tamer Özsu, Lei Zou
{"title":"Minimum motif-cut: a workload-aware RDF graph partitioning strategy","authors":"Peng Peng, Shengyi Ji, M. Tamer Özsu, Lei Zou","doi":"10.1007/s00778-024-00860-1","DOIUrl":null,"url":null,"abstract":"<p>In designing a distributed RDF system, it is quite common to divide an RDF graph into subgraphs, called <i>partitions</i>, which are then distributed. Graph partitioning in general and RDF graph partitioning in particular are challenging problems. In this paper, we propose an RDF graph partitioning approach, called <i>M</i>inimum <i>M</i>otif-<i>C</i>ut (MMC for short) to maximize the number of SPARQL queries in a workload that can be evaluated within one partition without interpartition joins. The motif is a common structure that occurs in queries. We prove that MMC partitioning problem is NP-complete and propose two greedy heuristic algorithms to solve it. One algorithm is basic, while the other is more advanced and optimized for data localization. A query is decomposed into a set of independently evaluatable subqueries based on RDF graph partitioning. The subqueries are executed in a distributed fashion and the results are assembled for the final result. Extensive experiments over synthetic and real RDF graphs and their corresponding logs show that the proposed technique can significantly avoid interpartition joins and results in good performance.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-024-00860-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In designing a distributed RDF system, it is quite common to divide an RDF graph into subgraphs, called partitions, which are then distributed. Graph partitioning in general and RDF graph partitioning in particular are challenging problems. In this paper, we propose an RDF graph partitioning approach, called Minimum Motif-Cut (MMC for short) to maximize the number of SPARQL queries in a workload that can be evaluated within one partition without interpartition joins. The motif is a common structure that occurs in queries. We prove that MMC partitioning problem is NP-complete and propose two greedy heuristic algorithms to solve it. One algorithm is basic, while the other is more advanced and optimized for data localization. A query is decomposed into a set of independently evaluatable subqueries based on RDF graph partitioning. The subqueries are executed in a distributed fashion and the results are assembled for the final result. Extensive experiments over synthetic and real RDF graphs and their corresponding logs show that the proposed technique can significantly avoid interpartition joins and results in good performance.

Abstract Image

最小图案切割:一种工作量感知的 RDF 图分割策略
在设计分布式 RDF 系统时,通常会将 RDF 图划分为称为分区的子图,然后再进行分布式处理。一般来说,图的分割,尤其是 RDF 图的分割,都是具有挑战性的问题。在本文中,我们提出了一种 RDF 图分区方法,称为 "最小图案切割"(MMC),以最大限度地增加工作负载中无需分区间连接即可在一个分区内评估的 SPARQL 查询的数量。图案是查询中常见的结构。我们证明了 MMC 分区问题是 NP-完全的,并提出了两种贪婪的启发式算法来解决这个问题。一种算法是基本算法,另一种算法则更先进,并针对数据本地化进行了优化。基于 RDF 图分割,查询被分解成一组独立的可评估子查询。子查询以分布式方式执行,并将结果汇总为最终结果。在合成和真实 RDF 图及其相应日志上进行的大量实验表明,所提出的技术能显著避免分区间的连接,并带来良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信