{"title":"Computational Reproducibility: A Practical Framework for Data Curators","authors":"Sandra L. Sawchuk, Shahira Khair","doi":"10.7191/jeslib.2021.1206","DOIUrl":"https://doi.org/10.7191/jeslib.2021.1206","url":null,"abstract":"Introduction: This paper presents concrete and actionable steps to guide researchers, data curators, and data managers in improving their understanding and practice of computational reproducibility.\u0000\u0000Objectives: Focusing on incremental progress rather than prescriptive rules, researchers and curators can build their knowledge and skills as the need arises. This paper presents a framework of incremental curation for reproducibility to support open science objectives.\u0000\u0000Methods: A computational reproducibility framework developed for the Canadian Data Curation Forum serves as the model for this approach. This framework combines learning about reproducibility with recommended steps to improving reproducibility.\u0000\u0000Conclusion: Computational reproducibility leads to more transparent and accurate research. The authors warn that fear of a crisis and focus on perfection should not prevent curation that may be ‘good enough.’","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47061512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Demgenski, Sebastian Karcher, Dessi Dessi, N. Weber
{"title":"Introducing the Qualitative Data Repository's Curation Handbook","authors":"Robert Demgenski, Sebastian Karcher, Dessi Dessi, N. Weber","doi":"10.7191/jeslib.2021.1207","DOIUrl":"https://doi.org/10.7191/jeslib.2021.1207","url":null,"abstract":"In this short practice paper, we introduce the public version of the Qualitative Data Repository’s (QDR) Curation Handbook. The Handbook documents and structures curation practices at QDR. We describe the background and genesis of the Handbook and highlight some of its key content.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43160300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hoa Q. Luong, Colleen Fallaw, Genevieve Schmitt, S. Braxton, Heidi J. Imker
{"title":"Responding to Reality: Evolving Curation Practices and Infrastructure at the University of Illinois at Urbana-Champaign","authors":"Hoa Q. Luong, Colleen Fallaw, Genevieve Schmitt, S. Braxton, Heidi J. Imker","doi":"10.7191/jeslib.2021.1202","DOIUrl":"https://doi.org/10.7191/jeslib.2021.1202","url":null,"abstract":"Objective: The Illinois Data Bank provides Illinois researchers with the infrastructure to publish research data publicly. During a five-year review of the Research Data Service at the University of Illinois at Urbana-Champaign, it was recognized as the most useful service offering in the unit. Internal metrics are captured and used to monitor the growth, document curation workflows, and surface technical challenges faced as we assist our researchers. Here we present examples of these curation challenges and the solutions chosen to address them.\u0000\u0000Methods: Some Illinois Data Bank metrics are collected internally by within the system, but most of the curation metrics reported here are tracked separately in a Google spreadsheet. The curator logs required information after curation is complete for each dataset. While the data is sometimes ambiguous (e.g., depending on researcher uptake of suggested actions), our curation data provide a general understanding about our data repository and have been useful in assessing our workflows and services. These metrics also help prioritize development needs for the Illinois Data Bank.\u0000\u0000Results and Conclusions: The curatorial services polish and improve the datasets, which contributes to the spirit of data reuse. Although we continue to see challenges in our processes, curation makes a positive impact on datasets. Continued development and adaptation of the technical infrastructure allows for an ever-better experience for the curators and users. These improvements have helped our repository more effectively support the data sharing process by successfully fostering depositor engagement with curators to improve datasets and facilitating easy transfer of very large files.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43296380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Plain Text & Character Encoding: A Primer for Data Curators","authors":"S. Erickson","doi":"10.7191/jeslib.2021.1211","DOIUrl":"https://doi.org/10.7191/jeslib.2021.1211","url":null,"abstract":"Plain text data consists of a sequence of encoded characters or “code points” from a given standard such as the Unicode Standard. Some of the most common file formats for digital data used in eScience (CSV, XML, and JSON, for example) are built atop plain text standards. Plain text representations of digital data are often preferred because plain text formats are relatively stable, and they facilitate reuse and interoperability. Despite its ubiquity, plain text is not as plain as it may seem. The set of standards used in modern text encoding (principally, the Unicode Character Set and the related encoding format, UTF-8) have complex architectures when compared to historical standards like ASCII. Further, while the Unicode standard has gained in prominence, text encoding problems are not uncommon in research data curation. This primer provides conceptual foundations for modern text encoding and guidance for common curation and preservation actions related to textual data.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42225198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cynthia Hudson Vitale, J. Carlson, H. Hadley, L. Johnston
{"title":"Introduction to the Special JeSLIB Issue on Data Curation in Practice","authors":"Cynthia Hudson Vitale, J. Carlson, H. Hadley, L. Johnston","doi":"10.7191/jeslib.2021.1222","DOIUrl":"https://doi.org/10.7191/jeslib.2021.1222","url":null,"abstract":"Research data curation is a set of scientific communication processes and\u0000activities that support the ethical reuse of research data and uphold\u0000research integrity. Data curators act as key collaborators with researchers\u0000to enrich the scholarly value and potential impact of their data through\u0000preparing it to be shared with others and preserved for the long term. This\u0000special issues focuses on practical data curation workflows and tools that\u0000have been developed and implemented within data repositories, scholarly\u0000societies, research projects, and academic institutions.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49435692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inna Kouper, Karen L. Tucker, Kevin Tharp, Mary Ellen van Booven, Ashley Clark
{"title":"Active Curation of Large Longitudinal Surveys: A Case Study","authors":"Inna Kouper, Karen L. Tucker, Kevin Tharp, Mary Ellen van Booven, Ashley Clark","doi":"10.7191/jeslib.2021.1210","DOIUrl":"https://doi.org/10.7191/jeslib.2021.1210","url":null,"abstract":"In this paper we take an in-depth look at the curation of a large longitudinal survey and activities and procedures involved in moving the data from its generation to the state that is needed to conduct scientific analysis. Using a case study approach, we describe how large surveys generate a range of data assets that require many decisions well before the data is considered for analysis and publication. We use the notion of active curation to describe activities and decisions about the data objects that are “live,” i.e., when they are still being collected and processed for the later stages of the data lifecycle. Our efforts illustrate a gap in the existing discussions on curation. On one hand, there is an acknowledged need for active or upstream curation as an engagement of curators close to the point of data creation. On the other hand, the recommendations on how to do that are scattered across multiple domain-oriented data efforts.\u0000\u0000In describing the complexities of active curation of survey data and providing general recommendations we aim to draw attention to the practices of active curation, stimulate the development of interoperable tools, standards, and techniques needed at the initial stages of research projects, and encourage collaborations between libraries and other academic units.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46160925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Samuel, Michael Moore, Helenmary Sheridan, Chris Sorensen, Brandon Patterson
{"title":"Touring a Data Curation Network Primer: A Focus on Neuroimaging Data","authors":"S. Samuel, Michael Moore, Helenmary Sheridan, Chris Sorensen, Brandon Patterson","doi":"10.7191/jeslib.2021.1204","DOIUrl":"https://doi.org/10.7191/jeslib.2021.1204","url":null,"abstract":"This video article provides an introduction to a data primer which leads data curators through the process of preparing a neuroimaging dataset for submission into a repository. A team of health sciences librarians and informationists created the primer which is focused on data from functional magnetic resonance images that are saved in either DICOM or NIfTI formats. The video walks through a flowchart discussing the process of preparing data sets to be deposited into a repository, key curatorial questions to ask for data that is highly sensitive, and how to suggest edits to this and other primers. The primer grew out of a data curation workshop hosted by the Data Curation Network.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47551933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Not Forgetting – 80s Style","authors":"R. Raboin","doi":"10.7191/jeslib.2021.1223","DOIUrl":"https://doi.org/10.7191/jeslib.2021.1223","url":null,"abstract":"Keeping in mind the work done by data librarians is key to understanding the importance of providing open and free access to data. Standards such as persistent identifiers (PIDs) were created to provide long-lasting access to all types of digital materials and resources. Providing new ways to inform and instruct researchers and other users on the importance of making data available for sharing, reproducibility, and re-use helps in driving good and effective social policy for researchers.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45045284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kasey C. Soska, Melody Xu, Sandy L. Gonzalez, Orit Hertzberg, Catherine S Tamis-LeMonda, R. Gilmore, K. Adolph
{"title":"(Hyper)active Data Curation: A Video Case Study from Behavioral Science.","authors":"Kasey C. Soska, Melody Xu, Sandy L. Gonzalez, Orit Hertzberg, Catherine S Tamis-LeMonda, R. Gilmore, K. Adolph","doi":"10.31234/OSF.IO/89RCB","DOIUrl":"https://doi.org/10.31234/OSF.IO/89RCB","url":null,"abstract":"Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation-where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt \"hyperactive\" data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary-a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":"10 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45481101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Teplitzky, Wynn Tranfield, Mea Warren, Philip White
{"title":"Introducing Reproducibility to Citation Analysis: a Case Study in the Earth Sciences","authors":"S. Teplitzky, Wynn Tranfield, Mea Warren, Philip White","doi":"10.7191/JESLIB.2021.1194","DOIUrl":"https://doi.org/10.7191/JESLIB.2021.1194","url":null,"abstract":"Objectives:\u0000\u0000Replicate methods from a 2019 study of Earth Science researcher citation practices.\u0000\u0000Calculate programmatically whether researchers in Earth Science rely on a smaller subset of literature than estimated by the 80/20 rule.\u0000\u0000Determine whether these reproducible citation analysis methods can be used to analyze open access uptake.\u0000\u0000Methods: Replicated methods of a prior citation study provide an updated transparent, reproducible citation analysis protocol that can be replicated with Jupyter Notebooks.\u0000\u0000Results: This study replicated the prior citation study’s conclusions, and also adapted the author’s methods to analyze the citation practices of Earth Scientists at four institutions. We found that 80% of the citations could be accounted for by only 7.88% of journals, a key metric to help identify a core collection of titles in this discipline. We then demonstrated programmatically that 36% of these cited references were available as open access.\u0000\u0000Conclusions: Jupyter Notebooks are a viable platform for disseminating replicable processes for citation analysis. A completely open methodology is emerging and we consider this a step forward. Adherence to the 80/20 rule aligned with institutional research output, but citation preferences are evident. Reproducible citation analysis methods may be used to analyze open access uptake, however, results are inconclusive. It is difficult to determine whether an article was open access at the time of citation, or became open access after an embargo.","PeriodicalId":90214,"journal":{"name":"Journal of escience librarianship","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43986015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}