Testing Our Assumptions: Preliminary Results from the Data Curation Network

Elizabeth Coburn, L. Johnston
{"title":"Testing Our Assumptions: Preliminary Results from the Data Curation Network","authors":"Elizabeth Coburn, L. Johnston","doi":"10.7191/jeslib.2020.1186","DOIUrl":null,"url":null,"abstract":"Objective: Data curation is becoming widely accepted as a necessary component of data sharing. Yet, as there are so many different types of data with various curation needs, the Data Curation Network (DCN) project anticipated that a collaborative approach to data curation across a network of repositories would expand what any single institution might offer alone. Now, halfway through a three-year implementation phase, we’re testing our assumptions using one year of data from the DCN. Methods: Ten institutions participated in the implementation phase of a shared staffing model for curating research data. Starting on January 1, 2019, for 12 months we tracked the number, file types, and disciplines represented in data sets submitted to the DCN. Participating curators were matched to data sets based on their self-reported curation expertise. Aspects such as curation time, level of satisfaction with the assignment, and lack of appropriate expertise in the network were tracked and analyzed. Results: Seventy-four data sets were submitted to the DCN in year one. Seventy-one of them were successfully curated by DCN curators. Each curation assignment takes 2.4 hours on average, and data sets take a median of three days to pass through the network. By analyzing the domain and file types of first- year submissions, we find that our coverage is well represented across domains and that our capacity is higher than the demand, but we also observed that the higher volume of data containing software code relied on certain curator expertise more often than others, creating potential unbalance. Conclusions: The data from year one of the DCN pilot have verified key assumptions about our collaborative approach to data curation, and these results have raised additional questions about capacity, equitable use of network resources, and sustained growth that we hope to answer by the end of this implementation phase.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of escience librarianship","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7191/jeslib.2020.1186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Objective: Data curation is becoming widely accepted as a necessary component of data sharing. Yet, as there are so many different types of data with various curation needs, the Data Curation Network (DCN) project anticipated that a collaborative approach to data curation across a network of repositories would expand what any single institution might offer alone. Now, halfway through a three-year implementation phase, we’re testing our assumptions using one year of data from the DCN. Methods: Ten institutions participated in the implementation phase of a shared staffing model for curating research data. Starting on January 1, 2019, for 12 months we tracked the number, file types, and disciplines represented in data sets submitted to the DCN. Participating curators were matched to data sets based on their self-reported curation expertise. Aspects such as curation time, level of satisfaction with the assignment, and lack of appropriate expertise in the network were tracked and analyzed. Results: Seventy-four data sets were submitted to the DCN in year one. Seventy-one of them were successfully curated by DCN curators. Each curation assignment takes 2.4 hours on average, and data sets take a median of three days to pass through the network. By analyzing the domain and file types of first- year submissions, we find that our coverage is well represented across domains and that our capacity is higher than the demand, but we also observed that the higher volume of data containing software code relied on certain curator expertise more often than others, creating potential unbalance. Conclusions: The data from year one of the DCN pilot have verified key assumptions about our collaborative approach to data curation, and these results have raised additional questions about capacity, equitable use of network resources, and sustained growth that we hope to answer by the end of this implementation phase.
检验我们的假设:来自数据管理网络的初步结果
目的:作为数据共享的必要组成部分,数据管理正被广泛接受。然而,由于有如此多不同类型的数据具有不同的管理需求,数据管理网络(DCN)项目预计,跨存储库网络的数据管理协作方法将扩展任何单个机构可能单独提供的服务。现在,三年的实施阶段已经过半,我们正在用DCN一年的数据来测试我们的假设。方法:10个机构参与了共享人员配置模型的实施阶段,以管理研究数据。从2019年1月1日开始,我们跟踪了提交给DCN的数据集中所代表的数量、文件类型和学科,持续了12个月。参与的策展人根据他们自我报告的策展专业知识与数据集相匹配。跟踪和分析了诸如管理时间、对任务的满意度以及网络中缺乏适当的专业知识等方面。结果:第一年共向DCN提交74组数据。其中71件由DCN策展人成功策展。每个策展任务平均需要2.4小时,数据集通过网络的平均时间为三天。通过分析第一年提交的领域和文件类型,我们发现我们的覆盖范围是跨领域的,我们的能力高于需求,但我们也观察到,包含软件代码的数据量更高,更依赖于某些管理员的专业知识,而不是其他人,造成潜在的不平衡。结论:DCN试点第一年的数据验证了我们对数据管理合作方法的关键假设,这些结果提出了关于容量、网络资源的公平使用和持续增长的其他问题,我们希望在实施阶段结束时回答这些问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信