{"title":"RDF数据的基于推理的模式发现","authors":"Redouane Bouhamoum , Zoubida Kedad , Stéphane Lopes","doi":"10.1016/j.datak.2025.102491","DOIUrl":null,"url":null,"abstract":"<div><div>The Semantic Web represents a huge information space where an increasing number of datasets, described in RDF, are made available to users and applications. In this context, the data is not constrained by a predefined schema. In RDF datasets, the schema may be incomplete or even missing. While this offers high flexibility in creating data sources, it also makes their use difficult. Several works have addressed the problem of automatic schema discovery for RDF datasets, but existing approaches rely only on the explicit information provided by the data source, which may limit the quality of the results. Indeed, in an RDF data source, an entity is described by explicitly declared properties, but also by implicit properties that can be derived using reasoning rules. These implicit properties are not considered by existing schema discovery approaches.</div><div>In this work, we propose a first contribution towards a hybrid schema discovery approach capable of exploiting all the semantics of a data source, which is represented not only by the explicitly declared triples, but also by the ones that can be inferred through reasoning. By considering both explicit and implicit properties, the quality of the generated schema is improved. We provide a scalable design of our approach to enable the processing of large RDF data sources while improving the quality of the results. We present some experiments which demonstrate the efficiency of our proposal and the quality of the discovered schema.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102491"},"PeriodicalIF":2.7000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inference-based schema discovery for RDF data\",\"authors\":\"Redouane Bouhamoum , Zoubida Kedad , Stéphane Lopes\",\"doi\":\"10.1016/j.datak.2025.102491\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The Semantic Web represents a huge information space where an increasing number of datasets, described in RDF, are made available to users and applications. In this context, the data is not constrained by a predefined schema. In RDF datasets, the schema may be incomplete or even missing. While this offers high flexibility in creating data sources, it also makes their use difficult. Several works have addressed the problem of automatic schema discovery for RDF datasets, but existing approaches rely only on the explicit information provided by the data source, which may limit the quality of the results. Indeed, in an RDF data source, an entity is described by explicitly declared properties, but also by implicit properties that can be derived using reasoning rules. These implicit properties are not considered by existing schema discovery approaches.</div><div>In this work, we propose a first contribution towards a hybrid schema discovery approach capable of exploiting all the semantics of a data source, which is represented not only by the explicitly declared triples, but also by the ones that can be inferred through reasoning. By considering both explicit and implicit properties, the quality of the generated schema is improved. We provide a scalable design of our approach to enable the processing of large RDF data sources while improving the quality of the results. We present some experiments which demonstrate the efficiency of our proposal and the quality of the discovered schema.</div></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"160 \",\"pages\":\"Article 102491\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X25000862\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000862","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
The Semantic Web represents a huge information space where an increasing number of datasets, described in RDF, are made available to users and applications. In this context, the data is not constrained by a predefined schema. In RDF datasets, the schema may be incomplete or even missing. While this offers high flexibility in creating data sources, it also makes their use difficult. Several works have addressed the problem of automatic schema discovery for RDF datasets, but existing approaches rely only on the explicit information provided by the data source, which may limit the quality of the results. Indeed, in an RDF data source, an entity is described by explicitly declared properties, but also by implicit properties that can be derived using reasoning rules. These implicit properties are not considered by existing schema discovery approaches.
In this work, we propose a first contribution towards a hybrid schema discovery approach capable of exploiting all the semantics of a data source, which is represented not only by the explicitly declared triples, but also by the ones that can be inferred through reasoning. By considering both explicit and implicit properties, the quality of the generated schema is improved. We provide a scalable design of our approach to enable the processing of large RDF data sources while improving the quality of the results. We present some experiments which demonstrate the efficiency of our proposal and the quality of the discovered schema.
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.