{"title":"Practices, Challenges, and Prospects of Big Data Curation: a Case Study in Geoscience","authors":"Suzhen Chen, Bin Chen","doi":"10.2218/ijdc.v14i1.669","DOIUrl":"https://doi.org/10.2218/ijdc.v14i1.669","url":null,"abstract":"Open and persistent access to past, present, and future scientifc data is fundamental for transparent and reproducible data-driven research. The scientifc community is now facing both challenges and opportunities caused by the growingly complex disciplinary data systems. Concerted efforts from domain experts, information professionals, and Internet technology experts are essential to ensure the accessibility and interoperability of the big data. Here we review current practices in building and managing big data within the context of large data infrastructure, using geoscience cyberinfrastructure such as Interdisciplinary Earth Data Alliance (IEDA) and EarthCube as a case study. Geoscience is a data-rich discipline with a rapid expansion of sophisticated and diverse digital data sets. Having started to embrace the digital age, the community have applied big data and data mining tools into the new type of research. We also identify current challenges, key elements, and prospects to construct a more robust and future-proof big data infrastructure for research and publication for the future, as well as the roles, qualifcations, and opportunities for librarians1information professionals in the data era. Received 06 May 2019 ~ Accepted 11 September 2019 Correspondence should be addressed to Suzhen Chen, Cataloging Department, University of Hawaiʻi at Mānoa Library, 2550 McCarthy Mall, Honolulu, Hawaii 96822. Email: suzhen@hawaii.edu The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by the University of Edinburgh on behalf of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ Copyright rests with the authors. This work is released under a Creative Commons Attribution Licence, version 4.0. For details please see https://creativecommons.org/licenses/by/4.0/ International Journal of Digital Curation 2020, Vol. 14, Iss. 1, 275–291 275 http:11dx.doi.org110.22181ijdc.v14i1.669 DOI: 10.22181ijdc.v14i1.669 276 | Practices, Challenges and Prospects of Big Data Curation doi:10.2218/ijdc.v14i1.669","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74991700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the Reproducibility of LaTeX Documents by Enriching Figures with Embedded Scripts and Data","authors":"C. Jacobs","doi":"10.2218/ijdc.v14i1.656","DOIUrl":"https://doi.org/10.2218/ijdc.v14i1.656","url":null,"abstract":"The introduction of open access data policies by research councils, the enforcement of best practices, and the deployment of persistent online repositories have enabled datasets that support results in scientific papers to become more widely accessible. Unfortunately, despite this advancement in the curation/publishing workflow, the data-driven figures within a paper often remain difficult to reproduce. Plotting or analysis scripts rarely accompany the manuscript or any associated software release; and even if they do, it may be unclear exactly which version was used. Furthermore, the precise commands and parameters used to execute the scripts are often not included in a README file or in the paper itself. This paper introduces a new open source digital curation tool, Pynea, for improving the reproducibility of LaTeX documents. Each figure within a document is enriched by automatically embedding the plotting script and data files required to generate it, such that it can be regenerated by readers of the paper in the future. The command used to execute the plotting script is also added to the figure’s metadata, along with details of the specific version of the script used (if the script is tracked with the Git version control system). If the document is to be recompiled with a figure that has since changed, or had its plotting script or data files modified, the figure is regenerated such that the author can be confident that the latest version of the figure and its dependencies are included. Received 06 April 2019 | Revision received 30 June 2019 | Accepted 12 August 2019 Correspondence should be addressed to Dr Christian T. Jacobs, Defence Science and Technology Laboratory (Dstl), Porton Down, Salisbury, Wiltshire, SP4 0JQ, United Kingdom, Email: cjacobs@dstl.gov.uk The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by the University of Edinburgh on behalf of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ Copyright rests with the authors. This work is released under a Creative Commons Attribution 4.0 International Licence. For details please see http://creativecommons.org/licenses/by/4.0/ International Journal of Digital Curation 2020, Vol. 14, Iss. 1, 292–302. 292 https://doi.org/10.2218/ijdc.v14i1.656 DOI: 10.2218/ijdc.v14i1.656 doi:10.2218/ijdc.v14i1.656 Christian T. Jacobs | 293","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89799898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building an Aotearoa New Zealand-wide Digital Curation Community of Practice","authors":"Jessica Moran, Floran Feltham, Valerie Love","doi":"10.2218/ijdc.v14i1.638","DOIUrl":"https://doi.org/10.2218/ijdc.v14i1.638","url":null,"abstract":"How do you build awareness and capability for digital curation knowledge and experience across a country? The National Library of New Zealand has a statutory role in supporting and advancing the work of Aotearoa New Zealand libraries to ensure documentary heritage and taonga is collected and preserved across the country’s memory system. This role includes supporting the collecting and curation of born-digital content. Aotearoa New Zealand’s Gallery Library Archive Museum (GLAM) sector is small but varied and diverse, so requires a flexible and adaptive plan to grow experience and capability in this area. This paper will describe the background research undertaken to gain a better understanding of the current environment, describe the development and delivery of pilot training in managing born-digital archival content, and outline our next steps. Driving this effort has been two foundational principles: 1) theory and practice are always in conversation with each other and practical hands-on experience is as important as theoretical knowledge and understanding; and 2) the work of growing capability should be done in a spirt of collaboration and partnership, meeting each other as equals and learning from each other.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84251473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Class Focused Approach to Research Outputs and Policy Literature Metadata","authors":"Les Kneebone","doi":"10.2218/ijdc.v14i1.640","DOIUrl":"https://doi.org/10.2218/ijdc.v14i1.640","url":null,"abstract":"Successful research object sharing requires that systems and users understand the structure, semantics and rules that govern a given research object collection. \u0000A number of metadata standards define ontologies and vocabularies for consistent expression of research object semantics. Supporting, clarifying and sometimes extending these standards are metadata application profiles (MAPs). MAPs play a key role defining metadata element cardinality and data types. MAPs may also mandate or recommend controlled vocabularies, where metadata standards have not already mentioned these in formal range declarations, encoding schemes and semantics that are to be consumed by external systems. MAPs also guide design options for in-house systems and workflows. In this paper, development of a draft MAP for grey-literature policy and research collections is discussed. A focus of the discussion is the considerations around selection and adoption of metadata standards given the research data and literature communities in the APO stakeholder map. \u0000This paper presents a work-in-progress version of a Dublin Core Application Profile (DCAP) candidate. The Analysis & Policy Observatory Metadata Application Profile (APO-MAP) takes research object class structure as a starting point and considers class model options, especially given the availability of registry services and Persistent Indenter (PID) systems. The discussion finds that MAP development progresses towards a best fit that balances the need to adopt widely supported standards, local business drivers, and community acceptance.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78206614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research Data Management in a cultural heritage organisation","authors":"T. Drysdale","doi":"10.2218/ijdc.v14i1.647","DOIUrl":"https://doi.org/10.2218/ijdc.v14i1.647","url":null,"abstract":"Research is a core function of cultural heritage organisations. Inevitably, the undertaking of research by galleries, libraries, archives and museums (the GLAM sector) leads to the creation of vast quantities of research data. Yet despite growing recognition that research data must be managed if it is to be exploited effectively, and in spite of increasing understanding of research data management practices and needs, particularly in the higher education sector, knowledge of research data management in cultural heritage organisations remains extremely limited. This paper represents an attempt to address the limited awareness of research data management in the cultural heritage sector. It presents the results of a data management audit conducted at Historic Royal Palaces (HRP) in 2018. The study reveals that research data management at HRP is underdeveloped, while highlighting some causes for optimism. The results of the study are compared to the results of similar studies conducted in UK higher education institutions (HEIs), highlighting the many discrepancies in the ways that research data is managed at HRP and in the HE sector. Recognition of these differences and similarities, it is argued, is necessary for the development of better research data management practices and tools for the heritage sector. Received 15 January 2019 ~ Revision received 09 August 2019 ~ Accepted 09 August 2019 Correspondence should be addressed to Tom Drysdale, 4B The Casemates, HM Tower of London, EC3N 4AB. Email: tom.drysdale@hrp.org.uk The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by the University of Edinburgh on behalf of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ Copyright rests with the authors. This work is released under a Creative Commons Attribution Licence, version 4.0. For details please see https://creativecommons.org/licenses/by/4.0/ International Journal of Digital Curation 2019, Vol. 14, Iss. 1, 199–227 199 http://dx.doi.org/10.2218/ijdc.v14i1.647 DOI: 10.2218/ijdc.v14i1.647 200 | Research Data Management in a Cultural Heritage Organisation doi:10.2218/ijdc.v14i1.647","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86036160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedding Analytics within the Curation of Scientific Workflows.","authors":"Gerard Weatherby, Michael R Gryk","doi":"10.2218/ijdc.v15i1.709","DOIUrl":"https://doi.org/10.2218/ijdc.v15i1.709","url":null,"abstract":"<p><p>This paper reports on the ongoing activities and curation practices of the National Center for Biomolecular NMR Data Processing and Analysis. Over the past several years, the Center has been developing and extending computational workflow management software for use by a community of biomolecular NMR spectroscopists. Previous work had been to refactor the workflow system to utilize the PREMIS framework for reporting retrospective provenance as well as for sharing workflows between scientists and to support data reuse. In this paper, we report on our recent efforts to embed analytics within the workflow execution and within provenance tracking. Important metrics for each of the intermediate datasets are included within the corresponding PREMIS intellectual object, which allows for both inspection of the operation of individual actors as well as visualization of the changes throughout a full processing workflow. These metrics can be viewed within the workflow management system or through standalone metadata widgets. Our approach is to support a hybrid approach of both automated, workflow execution as well as manual intervention and metadata management. In this combination, the workflow system and metadata widgets encourage the domain experts to be avid curators of the data which they create, fostering both computational reproducibility and scientific data reuse.</p>","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7990377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25517216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding a Repository with the Help of Machine-Actionable DMPs: Opportunities and Challenges","authors":"Simon Oblasser, Tomasz Miksa, A. Kitamoto","doi":"10.2218/ijdc.v15i1.704","DOIUrl":"https://doi.org/10.2218/ijdc.v15i1.704","url":null,"abstract":"","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73375961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Out of the Jar into the World! A Case Study on Storing and Sharing Vertebrate Data","authors":"S. Borda","doi":"10.2218/ijdc.v15i1.700","DOIUrl":"https://doi.org/10.2218/ijdc.v15i1.700","url":null,"abstract":"","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77759950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas Andrew Smale, Kathryn Unsworth, G. Denyer, Elise Magatova, Daniel Barr
{"title":"A Review of the History, Advocacy and Efficacy of Data Management Plans","authors":"Nicholas Andrew Smale, Kathryn Unsworth, G. Denyer, Elise Magatova, Daniel Barr","doi":"10.2218/ijdc.v15i1.525","DOIUrl":"https://doi.org/10.2218/ijdc.v15i1.525","url":null,"abstract":"","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75910752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Piloting a Community of Student Data Consultants that Supports and Enhances Research Data Services","authors":"Jonathan S. Briganti, A. Ogier, Anne-Marie Brown","doi":"10.2218/ijdc.v15i1.723","DOIUrl":"https://doi.org/10.2218/ijdc.v15i1.723","url":null,"abstract":"Research ecosystems within university environments are continuously evolving and requiring more resources and domain specialists to assist with the data lifecycle. Typically, academic researchers and professionals are overcommitted, making it challenging to be up-to-date on recent developments in best practices of data management, curation, transformation, analysis, and visualization. Recently, research groups, university core centers, and Libraries are revitalizing these services to fill in the gaps to aid researchers in finding new tools and approaches to make their work more impactful, sustainable, and replicable. In this paper, we report on a student consultation program built within the University Libraries, that takes an innovative, student-centered approach to meeting the research data needs in a university environment while also providing students with experiential learning opportunities. This student program, DataBridge, trains students to work in multi-disciplinary teams and as student consultants to assist faculty, staff, and students with their real-world, data-intensive research challenges. Centering DataBridge in the Libraries allows students the unique opportunity to work across all disciplines, on problems and in domains that some students may not interact with during their college careers. To encourage students from multiple disciplines to participate, we developed a scaffolded curriculum that allows students from any discipline and skill level to quickly develop the essential data science skill sets and begin contributing their own unique perspectives and specializations to the research consultations. These students, mentored by Informatics faculty in the Libraries, provide research support that can ultimately impact the entire research process. Through our pilot phase, we have found that DataBridge enhances the utilization and openness of data created through research, extends the reach and impact of the work beyond the researcher’s specialized community, and creates a network of student “data champions” across the University who see the value in working with the Library. Here, we describe the evolution of the DataBridge program and outline its unique role in both training the data stewards of the future with regard to FAIR data practices, and in contributing significant value to research projects at Virginia Tech. Ultimately, this work highlights the need for innovative, strategic programs that encourage and enable real-world experience of data curation, data analysis, and data publication for current researchers, all while training the next generation of researchers in these best practices.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88559220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}