IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits

IF 3.1 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Web Semantics Pub Date : 2023-04-01 DOI:10.1016/j.websem.2023.100775

Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, Franck Michel

{"title":"IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits","authors":"Pierre Maillot, Olivier Corby, Catherine Faron, Fabien Gandon, Franck Michel","doi":"10.1016/j.websem.2023.100775","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, a large number of RDF datasets have been built and published on the Web in fields as diverse as linguistics or life sciences, as well as general datasets such as DBpedia or Wikidata. The joint exploitation of these datasets requires specific knowledge about their content, access points, and commonalities. However, not all datasets contain a self-description, and not all access points can handle the complex queries used to generate such a description.</p><p>In this article, we provide a standard-based approach to generate the description of a dataset. The generated descriptions as well as the process of their computation are expressed using standard vocabularies and languages. We implemented our approach into a framework, called IndeGx, where each indexing feature and its computation is collaboratively and declaratively defined in a GitHub repository. We have experimented IndeGx on a set of 339 RDF datasets with endpoints listed in public catalogs, over 8 months. The results show that we can collect, as much as possible, important characteristics of the datasets depending on their availability and capacities. The resulting index captures the commonalities, variety and disparity in the offered content and services and it provides an important support to any application designed to query RDF datasets.</p></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"76 ","pages":"Article 100775"},"PeriodicalIF":3.1000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Semantics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1570826823000045","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 6

Abstract

In recent years, a large number of RDF datasets have been built and published on the Web in fields as diverse as linguistics or life sciences, as well as general datasets such as DBpedia or Wikidata. The joint exploitation of these datasets requires specific knowledge about their content, access points, and commonalities. However, not all datasets contain a self-description, and not all access points can handle the complex queries used to generate such a description.

In this article, we provide a standard-based approach to generate the description of a dataset. The generated descriptions as well as the process of their computation are expressed using standard vocabularies and languages. We implemented our approach into a framework, called IndeGx, where each indexing feature and its computation is collaboratively and declaratively defined in a GitHub repository. We have experimented IndeGx on a set of 339 RDF datasets with endpoints listed in public catalogs, over 8 months. The results show that we can collect, as much as possible, important characteristics of the datasets depending on their availability and capacities. The resulting index captures the commonalities, variety and disparity in the offered content and services and it provides an important support to any application designed to query RDF datasets.

查看原文本刊更多论文

indexx:使用基于sparql的测试套件为RDF知识图建立索引的模型和框架

近年来，在语言学或生命科学等不同领域，以及DBpedia或Wikidata等通用数据集，已经在Web上构建和发布了大量RDF数据集。联合利用这些数据集需要对它们的内容、访问点和共性有特定的了解。然而，并不是所有的数据集都包含自我描述，也不是所有的访问点都能处理用于生成这种描述的复杂查询。在本文中，我们提供了一种基于标准的方法来生成数据集的描述。生成的描述及其计算过程使用标准词汇和语言表示。我们将我们的方法实现到一个名为IndeGx的框架中，其中每个索引特性及其计算都是在GitHub存储库中以协作和声明的方式定义的。我们在一组339个RDF数据集上进行了indexx实验，这些数据集的端点列在公共目录中，耗时8个月。结果表明，我们可以根据数据集的可用性和容量尽可能多地收集数据集的重要特征。生成的索引捕获了所提供内容和服务的共性、多样性和差异性，它为任何设计用于查询RDF数据集的应用程序提供了重要的支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Web Semantics 工程技术-计算机：人工智能

CiteScore

6.20

自引率

12.00%

发文量

审稿时长

14.6 weeks

期刊介绍： The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.