A. Rauber, Bernhard Gößwein, C. Zwölf, C. Schubert, Florian Wörister, James Duncan, Katharina Flicker, K. Zettsu, Kristof Meixner, L. McIntosh, R. Jenkyns, Stefan Pröll, Tomasz Miksa, M. Parsons
{"title":"Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data","authors":"A. Rauber, Bernhard Gößwein, C. Zwölf, C. Schubert, Florian Wörister, James Duncan, Katharina Flicker, K. Zettsu, Kristof Meixner, L. McIntosh, R. Jenkyns, Stefan Pröll, Tomasz Miksa, M. Parsons","doi":"10.1162/99608f92.be565013","DOIUrl":null,"url":null,"abstract":"Precisely identifying arbitrary subsets of data so that these can be re-produced is a daunting challenge in data-driven science, the more so if the underlying data source is dynamically evolving. Yet, most settings exhibit exactly those characteristics: increasingly larger amounts of data being continuously ingested from a range of sources, with error correction and quality improvement processes adding to the dynamics. Yet, for studies to be reproducible, for decision-making to be transparent, and for meta studies to be performed conveniently, having a precise identification mechanism to reference, retrieve and work with such data is essential. The RDA Working Group on Dynamic Data Citation has published 14 recommendations that are centered around timestamping and versioning evolving data sources and identifying subsets dynamically via persistent identifiers that are assigned to the queries selecting the respective subsets. These principles are generic and work for virtually any kind of data. In the past few years numerous repositories around the globe have implemented these recommendations and deployed solution. This paper provides an overview of the recommendations, reference implementations and pilot systems deployed and analyses key lessons learned from these. This provides a solid","PeriodicalId":250931,"journal":{"name":"Issue 3.4, Fall 2021","volume":"49 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Issue 3.4, Fall 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/99608f92.be565013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Precisely identifying arbitrary subsets of data so that these can be re-produced is a daunting challenge in data-driven science, the more so if the underlying data source is dynamically evolving. Yet, most settings exhibit exactly those characteristics: increasingly larger amounts of data being continuously ingested from a range of sources, with error correction and quality improvement processes adding to the dynamics. Yet, for studies to be reproducible, for decision-making to be transparent, and for meta studies to be performed conveniently, having a precise identification mechanism to reference, retrieve and work with such data is essential. The RDA Working Group on Dynamic Data Citation has published 14 recommendations that are centered around timestamping and versioning evolving data sources and identifying subsets dynamically via persistent identifiers that are assigned to the queries selecting the respective subsets. These principles are generic and work for virtually any kind of data. In the past few years numerous repositories around the globe have implemented these recommendations and deployed solution. This paper provides an overview of the recommendations, reference implementations and pilot systems deployed and analyses key lessons learned from these. This provides a solid