{"title":"DWAHP: Workload Aware Hybrid Partitioning and Distribution of RDF Data","authors":"Trupti Padiya, Minal Bhise","doi":"10.1145/3105831.3105864","DOIUrl":null,"url":null,"abstract":"Proliferation of RDF data has reached to a peak where data is partitioned across multiple nodes. Significant contribution for developing solutions to manage RDF data in distributed environment is witnessed in recent years. We propose a workload aware hybrid partitioning approach for a distributed environment. The objective of our approach is reducing query joins and inter-node communication leading it to faster query execution for frequent queries. Our approach considers a query workload and partitions data based on workload information. It distributes data by exploiting underlying structural relationship between properties using a property reachability matrix to optimize query performance. DWAHP gets rid of inter-node communication cost for frequent queries like linear and star queries and answers 83% of frequent query workload without inter-node communication. DWAHP is compared with state-of-the-art solutions in terms of query execution time, query cost, storage space, and inter-node communication. It has demonstrated significant improvement over state-of-the-art solution.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"151 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Database Engineering & Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3105831.3105864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Proliferation of RDF data has reached to a peak where data is partitioned across multiple nodes. Significant contribution for developing solutions to manage RDF data in distributed environment is witnessed in recent years. We propose a workload aware hybrid partitioning approach for a distributed environment. The objective of our approach is reducing query joins and inter-node communication leading it to faster query execution for frequent queries. Our approach considers a query workload and partitions data based on workload information. It distributes data by exploiting underlying structural relationship between properties using a property reachability matrix to optimize query performance. DWAHP gets rid of inter-node communication cost for frequent queries like linear and star queries and answers 83% of frequent query workload without inter-node communication. DWAHP is compared with state-of-the-art solutions in terms of query execution time, query cost, storage space, and inter-node communication. It has demonstrated significant improvement over state-of-the-art solution.