Steffen Neumann, Felix Bach, Leyla Jael Castro, Tillmann Fischer, Stefan Hofmann, Pei‐Chi Huang, Nicole Jung, Bhavin Katabathuni, Fabian Mauz, René Meier, V. C. Nainala, Noura Rayya, Christoph Steinbeck, O. Koepler
{"title":"Harmonising, Harvesting, and Searching Metadata Across a Repository Federation","authors":"Steffen Neumann, Felix Bach, Leyla Jael Castro, Tillmann Fischer, Stefan Hofmann, Pei‐Chi Huang, Nicole Jung, Bhavin Katabathuni, Fabian Mauz, René Meier, V. C. Nainala, Noura Rayya, Christoph Steinbeck, O. Koepler","doi":"10.52825/cordi.v1i.202","DOIUrl":null,"url":null,"abstract":"The collection of metadata for research data is an important aspect in the FAIR principles. The schema.org and Bioschemas initiatives created a vocabulary to embed markup for many different types, including BioChemEntity, ChemicalSubstance, Gene, MolecularEntity, Protein, and others relevant in the Natural and Life Sciences with immediate benefits for findability of data packages. To bridge the gap between the worlds of semantic-web-driven JSON+LD metadata on the one hand, and established but separately developed interface services in libraries, we have designed an architecture for harmonising, federating and harvesting metadata from several resources. Our approach is to serve JSON+LD embedded in an XML container through a central OAI-Provider. Several resources in NFDI4Chem provide such domain-specific metadata. The CKAN-based NFDI4Chem search service can harvest this metadata using an OAI-PMH harvester extension that can extract the XML-encapsulated JSON+LD metadata, and has search capabilities relevant in the chemistry domain. We invite the community to collaborate and reach a critical mass of providers and consumers in the NFDI.","PeriodicalId":359879,"journal":{"name":"Proceedings of the Conference on Research Data Infrastructure","volume":"8 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Research Data Infrastructure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52825/cordi.v1i.202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The collection of metadata for research data is an important aspect in the FAIR principles. The schema.org and Bioschemas initiatives created a vocabulary to embed markup for many different types, including BioChemEntity, ChemicalSubstance, Gene, MolecularEntity, Protein, and others relevant in the Natural and Life Sciences with immediate benefits for findability of data packages. To bridge the gap between the worlds of semantic-web-driven JSON+LD metadata on the one hand, and established but separately developed interface services in libraries, we have designed an architecture for harmonising, federating and harvesting metadata from several resources. Our approach is to serve JSON+LD embedded in an XML container through a central OAI-Provider. Several resources in NFDI4Chem provide such domain-specific metadata. The CKAN-based NFDI4Chem search service can harvest this metadata using an OAI-PMH harvester extension that can extract the XML-encapsulated JSON+LD metadata, and has search capabilities relevant in the chemistry domain. We invite the community to collaborate and reach a critical mass of providers and consumers in the NFDI.