SPARQL Optimization Using Re-ordering Joining Patterns with Surrogate Key Concept and Subset Patterns

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Web Engineering Pub Date : 2024-03-01 DOI:10.13052/jwe1540-9589.2334

Rupal Gupta;Sanjay Kumar Malik

{"title":"SPARQL Optimization Using Re-ordering Joining Patterns with Surrogate Key Concept and Subset Patterns","authors":"Rupal Gupta;Sanjay Kumar Malik","doi":"10.13052/jwe1540-9589.2334","DOIUrl":null,"url":null,"abstract":"Semantic web data resides on the web in the form of knowledge graphs known as RDF graphs and searching around the web has been always a crucial task. For the data retrieval of RDF data of the semantic web, SPARQL query language has been used which in turn is based on triple patterns and joins. Optimization of SPARQL query has been a problematic concern for decades due to the large amount of triple patterns associated with RDF data. Although several researchers have put a lot of effort into the optimization of SPARQL query, it is difficult to understand the concept from scratch due to its diversified nature. This paper analyses various optimization techniques for the SPARQL query used with the semantic web to process knowledge graphs. These techniques include join-based, heuristic-based, rule-based, and indexing-based approaches for optimization. This paper will help researchers in this domain to easily get into the core concept of SPARQL execution along with various optimization approaches used for query processing, which can help in various other domains like linked open data and information retrieval. In this paper, an optimization algorithm HSOA (hybrid SPARQL optimization algorithm) has been proposed, which comprises the features of index-based, cost-based, and triple reordering-based optimization approaches. The proposed hybrid algorithm has been designed specifically for n-triple RDF data, which comprises subset patterns, and surrogate key concepts. The results produced by the proposed algorithm are encouraging and have also been tested and compared with the benchmark dataset and SPARQL queries like LUBM, BSBM, and SP2Bench.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"23 3","pages":"393-430"},"PeriodicalIF":1.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10547280","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10547280/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic web data resides on the web in the form of knowledge graphs known as RDF graphs and searching around the web has been always a crucial task. For the data retrieval of RDF data of the semantic web, SPARQL query language has been used which in turn is based on triple patterns and joins. Optimization of SPARQL query has been a problematic concern for decades due to the large amount of triple patterns associated with RDF data. Although several researchers have put a lot of effort into the optimization of SPARQL query, it is difficult to understand the concept from scratch due to its diversified nature. This paper analyses various optimization techniques for the SPARQL query used with the semantic web to process knowledge graphs. These techniques include join-based, heuristic-based, rule-based, and indexing-based approaches for optimization. This paper will help researchers in this domain to easily get into the core concept of SPARQL execution along with various optimization approaches used for query processing, which can help in various other domains like linked open data and information retrieval. In this paper, an optimization algorithm HSOA (hybrid SPARQL optimization algorithm) has been proposed, which comprises the features of index-based, cost-based, and triple reordering-based optimization approaches. The proposed hybrid algorithm has been designed specifically for n-triple RDF data, which comprises subset patterns, and surrogate key concepts. The results produced by the proposed algorithm are encouraging and have also been tested and compared with the benchmark dataset and SPARQL queries like LUBM, BSBM, and SP2Bench.

查看原文本刊更多论文

使用重排序连接模式与代理关键概念和子集模式优化 SPARQL

语义网数据以知识图谱（即 RDF 图谱）的形式存在于网络上，而网络搜索一直是一项至关重要的任务。为了检索语义网的 RDF 数据，人们使用了 SPARQL 查询语言，而 SPARQL 又是基于三重模式和连接的。由于与 RDF 数据相关的三重模式数量庞大，几十年来，SPARQL 查询的优化一直是一个令人头疼的问题。虽然一些研究人员在优化 SPARQL 查询方面投入了大量精力，但由于其多样性，很难从头开始理解这一概念。本文分析了用于语义网处理知识图谱的 SPARQL 查询的各种优化技术。这些技术包括基于连接、基于启发式、基于规则和基于索引的优化方法。本文将帮助该领域的研究人员轻松了解 SPARQL 执行的核心概念以及用于查询处理的各种优化方法，这对链接开放数据和信息检索等其他各种领域都有帮助。本文提出了一种优化算法 HSOA（混合 SPARQL 优化算法），它包含基于索引、基于成本和基于三重重排序的优化方法。所提出的混合算法是专为 n 三重 RDF 数据设计的，其中包括子集模式和代理关键概念。所提算法产生的结果令人鼓舞，并与基准数据集和 SPARQL 查询（如 LUBM、BSBM 和 SP2Bench）进行了测试和比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Web Engineering 工程技术-计算机：理论方法

CiteScore

1.80

自引率

12.50%

发文量

审稿时长

9 months

期刊介绍： The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.