{"title":"Minimum motif-cut: a workload-aware RDF graph partitioning strategy","authors":"Peng Peng, Shengyi Ji, M. Tamer Özsu, Lei Zou","doi":"10.1007/s00778-024-00860-1","DOIUrl":null,"url":null,"abstract":"<p>In designing a distributed RDF system, it is quite common to divide an RDF graph into subgraphs, called <i>partitions</i>, which are then distributed. Graph partitioning in general and RDF graph partitioning in particular are challenging problems. In this paper, we propose an RDF graph partitioning approach, called <i>M</i>inimum <i>M</i>otif-<i>C</i>ut (MMC for short) to maximize the number of SPARQL queries in a workload that can be evaluated within one partition without interpartition joins. The motif is a common structure that occurs in queries. We prove that MMC partitioning problem is NP-complete and propose two greedy heuristic algorithms to solve it. One algorithm is basic, while the other is more advanced and optimized for data localization. A query is decomposed into a set of independently evaluatable subqueries based on RDF graph partitioning. The subqueries are executed in a distributed fashion and the results are assembled for the final result. Extensive experiments over synthetic and real RDF graphs and their corresponding logs show that the proposed technique can significantly avoid interpartition joins and results in good performance.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-024-00860-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In designing a distributed RDF system, it is quite common to divide an RDF graph into subgraphs, called partitions, which are then distributed. Graph partitioning in general and RDF graph partitioning in particular are challenging problems. In this paper, we propose an RDF graph partitioning approach, called Minimum Motif-Cut (MMC for short) to maximize the number of SPARQL queries in a workload that can be evaluated within one partition without interpartition joins. The motif is a common structure that occurs in queries. We prove that MMC partitioning problem is NP-complete and propose two greedy heuristic algorithms to solve it. One algorithm is basic, while the other is more advanced and optimized for data localization. A query is decomposed into a set of independently evaluatable subqueries based on RDF graph partitioning. The subqueries are executed in a distributed fashion and the results are assembled for the final result. Extensive experiments over synthetic and real RDF graphs and their corresponding logs show that the proposed technique can significantly avoid interpartition joins and results in good performance.