Harmonising, Harvesting, and Searching Metadata Across a Repository Federation

Proceedings of the Conference on Research Data Infrastructure Pub Date : 2023-09-07 DOI:10.52825/cordi.v1i.202

Steffen Neumann, Felix Bach, Leyla Jael Castro, Tillmann Fischer, Stefan Hofmann, Pei‐Chi Huang, Nicole Jung, Bhavin Katabathuni, Fabian Mauz, René Meier, V. C. Nainala, Noura Rayya, Christoph Steinbeck, O. Koepler

{"title":"Harmonising, Harvesting, and Searching Metadata Across a Repository Federation","authors":"Steffen Neumann, Felix Bach, Leyla Jael Castro, Tillmann Fischer, Stefan Hofmann, Pei‐Chi Huang, Nicole Jung, Bhavin Katabathuni, Fabian Mauz, René Meier, V. C. Nainala, Noura Rayya, Christoph Steinbeck, O. Koepler","doi":"10.52825/cordi.v1i.202","DOIUrl":null,"url":null,"abstract":"The collection of metadata for research data is an important aspect in the FAIR principles. The schema.org and Bioschemas initiatives created a vocabulary to embed markup for many different types, including BioChemEntity, ChemicalSubstance, Gene, MolecularEntity, Protein, and others relevant in the Natural and Life Sciences with immediate benefits for findability of data packages. To bridge the gap between the worlds of semantic-web-driven JSON+LD metadata on the one hand, and established but separately developed interface services in libraries, we have designed an architecture for harmonising, federating and harvesting metadata from several resources. Our approach is to serve JSON+LD embedded in an XML container through a central OAI-Provider. Several resources in NFDI4Chem provide such domain-specific metadata. The CKAN-based NFDI4Chem search service can harvest this metadata using an OAI-PMH harvester extension that can extract the XML-encapsulated JSON+LD metadata, and has search capabilities relevant in the chemistry domain. We invite the community to collaborate and reach a critical mass of providers and consumers in the NFDI.","PeriodicalId":359879,"journal":{"name":"Proceedings of the Conference on Research Data Infrastructure","volume":"8 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Research Data Infrastructure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52825/cordi.v1i.202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The collection of metadata for research data is an important aspect in the FAIR principles. The schema.org and Bioschemas initiatives created a vocabulary to embed markup for many different types, including BioChemEntity, ChemicalSubstance, Gene, MolecularEntity, Protein, and others relevant in the Natural and Life Sciences with immediate benefits for findability of data packages. To bridge the gap between the worlds of semantic-web-driven JSON+LD metadata on the one hand, and established but separately developed interface services in libraries, we have designed an architecture for harmonising, federating and harvesting metadata from several resources. Our approach is to serve JSON+LD embedded in an XML container through a central OAI-Provider. Several resources in NFDI4Chem provide such domain-specific metadata. The CKAN-based NFDI4Chem search service can harvest this metadata using an OAI-PMH harvester extension that can extract the XML-encapsulated JSON+LD metadata, and has search capabilities relevant in the chemistry domain. We invite the community to collaborate and reach a critical mass of providers and consumers in the NFDI.

查看原文本刊更多论文

跨储存库联合协调、收集和搜索元数据

收集研究数据的元数据是FAIR原则中的一个重要方面。schema.org和Bioschemas计划创建了一个词汇表来嵌入许多不同类型的标记，包括生物化学实体、化学物质、基因、分子实体、蛋白质和其他与自然和生命科学相关的标记，这对数据包的可查找性有直接的好处。一方面，为了弥合语义web驱动的JSON+LD元数据世界与在库中建立但单独开发的接口服务之间的差距，我们设计了一个架构，用于协调、联合和收集来自多个资源的元数据。我们的方法是通过中央OAI-Provider提供嵌入在XML容器中的JSON+LD。NFDI4Chem中的几个资源提供了这种特定于领域的元数据。基于ckan的NFDI4Chem搜索服务可以使用OAI-PMH收集器扩展来收集这些元数据，该扩展可以提取xml封装的JSON+LD元数据，并具有与化学领域相关的搜索功能。我们邀请社区进行合作，并在NFDI中达到临界质量的提供者和消费者。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Conference on Research Data Infrastructure

自引率

0.00%

发文量