A. Akhter, Muhammad Saleem, Alexander Bigerl, A. N. Ngomo
{"title":"Efficient RDF Knowledge Graph Partitioning Using Querying Workload","authors":"A. Akhter, Muhammad Saleem, Alexander Bigerl, A. N. Ngomo","doi":"10.1145/3460210.3493577","DOIUrl":null,"url":null,"abstract":"Data partitioning is an effective way to manage large datasets. While a broad range of RDF graph partitioning techniques has been proposed in previous works, little attention has been given to workload-aware RDF graph partitioning. In this paper, we propose two techniques that make use of the querying workload to detect the portions of RDF graphs that are often queried concurrently. Our techniques leverage predicate co-occurrences in SPARQL queries. By detecting highly co-occurring predicates, our techniques can keep data pertaining to these predicates in the same data partition. We evaluate the proposed partitioning techniques using various real-data and query benchmarks generated by the FEASIBLE SPARQL benchmark generation framework. Our evaluation results show the superiority of the proposed techniques in comparison to previous techniques in terms of better query runtime performances.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"187 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th on Knowledge Capture Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460210.3493577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Data partitioning is an effective way to manage large datasets. While a broad range of RDF graph partitioning techniques has been proposed in previous works, little attention has been given to workload-aware RDF graph partitioning. In this paper, we propose two techniques that make use of the querying workload to detect the portions of RDF graphs that are often queried concurrently. Our techniques leverage predicate co-occurrences in SPARQL queries. By detecting highly co-occurring predicates, our techniques can keep data pertaining to these predicates in the same data partition. We evaluate the proposed partitioning techniques using various real-data and query benchmarks generated by the FEASIBLE SPARQL benchmark generation framework. Our evaluation results show the superiority of the proposed techniques in comparison to previous techniques in terms of better query runtime performances.