Characteristic sets profile features: Estimation and application to SPARQL query planning

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web Pub Date : 2022-09-05 DOI:10.3233/sw-222903

Lars Heling, Maribel Acosta

{"title":"Characteristic sets profile features: Estimation and application to SPARQL query planning","authors":"Lars Heling, Maribel Acosta","doi":"10.3233/sw-222903","DOIUrl":null,"url":null,"abstract":"RDF dataset profiling is the task of extracting a formal representation of a dataset’s features. Such features may cover various aspects of the RDF dataset ranging from information on licensing and provenance to statistical descriptors of the data distribution and its semantics. In this work, we focus on the characteristics sets profile features that capture both structural and semantic information of an RDF dataset, making them a valuable resource for different downstream applications. While previous research demonstrated the benefits of characteristic sets in centralized and federated query processing, access to these fine-grained statistics is taken for granted. However, especially in federated query processing, computing this profile feature is challenging as it can be difficult and/or costly to access and process the entire data from all federation members. We address this shortcoming by introducing the concept of a profile feature estimation and propose a sampling-based approach to generate estimations for the characteristic sets profile feature. In addition, we showcase the applicability of these feature estimations in federated querying by proposing a query planning approach that is specifically designed to leverage these feature estimations. In our first experimental study, we intrinsically evaluate our approach on the representativeness of the feature estimation. The results show that even small samples of just 0.5 % of the original graph’s entities allow for estimating both structural and statistical properties of the characteristic sets profile features. Our second experimental study extrinsically evaluates the estimations by investigating their applicability in our query planner using the well-known FedBench benchmark. The results of the experiments show that the estimated profile features allow for obtaining efficient query plans.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"61 1","pages":"491-526"},"PeriodicalIF":2.9000,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Semantic Web","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/sw-222903","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

Abstract

RDF dataset profiling is the task of extracting a formal representation of a dataset’s features. Such features may cover various aspects of the RDF dataset ranging from information on licensing and provenance to statistical descriptors of the data distribution and its semantics. In this work, we focus on the characteristics sets profile features that capture both structural and semantic information of an RDF dataset, making them a valuable resource for different downstream applications. While previous research demonstrated the benefits of characteristic sets in centralized and federated query processing, access to these fine-grained statistics is taken for granted. However, especially in federated query processing, computing this profile feature is challenging as it can be difficult and/or costly to access and process the entire data from all federation members. We address this shortcoming by introducing the concept of a profile feature estimation and propose a sampling-based approach to generate estimations for the characteristic sets profile feature. In addition, we showcase the applicability of these feature estimations in federated querying by proposing a query planning approach that is specifically designed to leverage these feature estimations. In our first experimental study, we intrinsically evaluate our approach on the representativeness of the feature estimation. The results show that even small samples of just 0.5 % of the original graph’s entities allow for estimating both structural and statistical properties of the characteristic sets profile features. Our second experimental study extrinsically evaluates the estimations by investigating their applicability in our query planner using the well-known FedBench benchmark. The results of the experiments show that the estimated profile features allow for obtaining efficient query plans.

查看原文本刊更多论文

特征集概要特征:SPARQL查询计划的估计和应用

RDF数据集概要是提取数据集特征的形式化表示的任务。这些特性可能涵盖RDF数据集的各个方面，从许可和来源信息到数据分布及其语义的统计描述符。在这项工作中，我们专注于特征集概要特征，这些特征集捕获RDF数据集的结构和语义信息，使它们成为不同下游应用程序的宝贵资源。虽然以前的研究证明了特征集在集中式和联邦查询处理中的好处，但对这些细粒度统计数据的访问被认为是理所当然的。但是，特别是在联邦查询处理中，计算这个概要特性是具有挑战性的，因为访问和处理来自所有联邦成员的全部数据可能很困难，而且/或成本很高。我们通过引入轮廓特征估计的概念来解决这一缺点，并提出了一种基于采样的方法来生成特征集轮廓特征的估计。此外，我们通过提出一种专门为利用这些特征估计而设计的查询规划方法，展示了这些特征估计在联邦查询中的适用性。在我们的第一个实验研究中，我们从本质上评估了我们的方法对特征估计的代表性。结果表明，即使只有原始图实体的0.5%的小样本也可以估计特征集轮廓特征的结构和统计特性。我们的第二个实验研究通过使用著名的FedBench基准测试调查它们在我们的查询规划器中的适用性，从外部评估这些估计。实验结果表明，估计的轮廓特征可以获得有效的查询计划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Semantic Web COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

8.30

自引率

6.70%

发文量

期刊介绍： The journal Semantic Web – Interoperability, Usability, Applicability brings together researchers from various fields which share the vision and need for more effective and meaningful ways to share information across agents and services on the future internet and elsewhere. As such, Semantic Web technologies shall support the seamless integration of data, on-the-fly composition and interoperation of Web services, as well as more intuitive search engines. The semantics – or meaning – of information, however, cannot be defined without a context, which makes personalization, trust, and provenance core topics for Semantic Web research. New retrieval paradigms, user interfaces, and visualization techniques have to unleash the power of the Semantic Web and at the same time hide its complexity from the user. Based on this vision, the journal welcomes contributions ranging from theoretical and foundational research over methods and tools to descriptions of concrete ontologies and applications in all areas. We especially welcome papers which add a social, spatial, and temporal dimension to Semantic Web research, as well as application-oriented papers making use of formal semantics.