RDF数据的基于推理的模式发现

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering Pub Date : 2025-07-19 DOI:10.1016/j.datak.2025.102491

Redouane Bouhamoum , Zoubida Kedad , Stéphane Lopes

{"title":"RDF数据的基于推理的模式发现","authors":"Redouane Bouhamoum , Zoubida Kedad , Stéphane Lopes","doi":"10.1016/j.datak.2025.102491","DOIUrl":null,"url":null,"abstract":"<div><div>The Semantic Web represents a huge information space where an increasing number of datasets, described in RDF, are made available to users and applications. In this context, the data is not constrained by a predefined schema. In RDF datasets, the schema may be incomplete or even missing. While this offers high flexibility in creating data sources, it also makes their use difficult. Several works have addressed the problem of automatic schema discovery for RDF datasets, but existing approaches rely only on the explicit information provided by the data source, which may limit the quality of the results. Indeed, in an RDF data source, an entity is described by explicitly declared properties, but also by implicit properties that can be derived using reasoning rules. These implicit properties are not considered by existing schema discovery approaches.</div><div>In this work, we propose a first contribution towards a hybrid schema discovery approach capable of exploiting all the semantics of a data source, which is represented not only by the explicitly declared triples, but also by the ones that can be inferred through reasoning. By considering both explicit and implicit properties, the quality of the generated schema is improved. We provide a scalable design of our approach to enable the processing of large RDF data sources while improving the quality of the results. We present some experiments which demonstrate the efficiency of our proposal and the quality of the discovered schema.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102491"},"PeriodicalIF":2.7000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inference-based schema discovery for RDF data\",\"authors\":\"Redouane Bouhamoum , Zoubida Kedad , Stéphane Lopes\",\"doi\":\"10.1016/j.datak.2025.102491\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The Semantic Web represents a huge information space where an increasing number of datasets, described in RDF, are made available to users and applications. In this context, the data is not constrained by a predefined schema. In RDF datasets, the schema may be incomplete or even missing. While this offers high flexibility in creating data sources, it also makes their use difficult. Several works have addressed the problem of automatic schema discovery for RDF datasets, but existing approaches rely only on the explicit information provided by the data source, which may limit the quality of the results. Indeed, in an RDF data source, an entity is described by explicitly declared properties, but also by implicit properties that can be derived using reasoning rules. These implicit properties are not considered by existing schema discovery approaches.</div><div>In this work, we propose a first contribution towards a hybrid schema discovery approach capable of exploiting all the semantics of a data source, which is represented not only by the explicitly declared triples, but also by the ones that can be inferred through reasoning. By considering both explicit and implicit properties, the quality of the generated schema is improved. We provide a scalable design of our approach to enable the processing of large RDF data sources while improving the quality of the results. We present some experiments which demonstrate the efficiency of our proposal and the quality of the discovered schema.</div></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"160 \",\"pages\":\"Article 102491\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X25000862\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000862","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

语义网代表了一个巨大的信息空间，在这个空间中，越来越多的数据集（以RDF描述）可供用户和应用程序使用。在此上下文中，数据不受预定义模式的约束。在RDF数据集中，模式可能是不完整的，甚至是缺失的。虽然这为创建数据源提供了高度的灵活性，但也使数据源的使用变得困难。一些工作已经解决了RDF数据集的自动模式发现问题，但是现有的方法只依赖于数据源提供的显式信息，这可能会限制结果的质量。实际上，在RDF数据源中，实体是通过显式声明的属性来描述的，但也可以通过可以使用推理规则派生的隐式属性来描述。现有的模式发现方法不考虑这些隐式属性。在这项工作中，我们提出了对混合模式发现方法的第一个贡献，该方法能够利用数据源的所有语义，这些语义不仅由显式声明的三元组表示，而且由可以通过推理推断的三元组表示。通过同时考虑显式和隐式属性，提高了生成模式的质量。我们提供了一种可伸缩的方法设计，以支持处理大型RDF数据源，同时提高结果的质量。通过实验证明了该方法的有效性和所发现模式的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Inference-based schema discovery for RDF data

The Semantic Web represents a huge information space where an increasing number of datasets, described in RDF, are made available to users and applications. In this context, the data is not constrained by a predefined schema. In RDF datasets, the schema may be incomplete or even missing. While this offers high flexibility in creating data sources, it also makes their use difficult. Several works have addressed the problem of automatic schema discovery for RDF datasets, but existing approaches rely only on the explicit information provided by the data source, which may limit the quality of the results. Indeed, in an RDF data source, an entity is described by explicitly declared properties, but also by implicit properties that can be derived using reasoning rules. These implicit properties are not considered by existing schema discovery approaches.

In this work, we propose a first contribution towards a hybrid schema discovery approach capable of exploiting all the semantics of a data source, which is represented not only by the explicitly declared triples, but also by the ones that can be inferred through reasoning. By considering both explicit and implicit properties, the quality of the generated schema is improved. We provide a scalable design of our approach to enable the processing of large RDF data sources while improving the quality of the results. We present some experiments which demonstrate the efficiency of our proposal and the quality of the discovered schema.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.