{"title":"Digital Object Identifiers (DOIs) Prove Highly Effective for Long-Term Data Availability in PLOS ONE","authors":"Hilary Jasmin","doi":"10.18438/eblip30378","DOIUrl":null,"url":null,"abstract":"A Review of: Federer, L. M. (2022). Long-term availability of data associated with articles in PLOS ONE. PLOS ONE 17(8), Article e0272845. https://doi.org/10.1371/journal.pone.0272845 Objective – To retrieve a range of PLOS ONE data availability statements and quantify their ability to point to the study data efficiently and accurately. Research questions focused on availability over time, availability of URLs versus DOIs, the ability to locate resources using the data availability statement and availability based on data sharing method. Design – Observational study. Setting – PLOS ONE archive. Subjects – A corpus of 47,593 data availability statements from research articles in PLOS ONE between March 1, 2014, and May 31, 2016. Methods – Use of custom R scripts to retrieve 47,593 data availability statements; of these, 6,912 (14.5%) contained at least one URL or DOI. Once these links were extracted, R scripts were run to fetch the resources and record HTTP status codes to determine if the resource was discoverable. To address the potential for the DOI or URL to fetch but not actually contain the appropriate data, the researchers selected at random and manually retrieved the data for 350 URLs and 350 DOIs. Main Results – Of the unique URLs, 75% were able to be automatically retrieved by custom R scripts. In the manual sample of 350 URLs, which was used to test for accuracy of the URLs in containing the data, there was a 78% retrieval rate. Of the unique DOIs, 90% were able to be automatically retrieved by custom R scripts. The manual sample of 350 DOIs had a 98% retrieval rate. Conclusion – DOIs, especially those linked with a repository, had the highest rate of success in retrieving the data attached to the article. While URLs were better than no link at all, URLs are susceptible to content drift and need more management for long-term data availability.","PeriodicalId":45227,"journal":{"name":"Evidence Based Library and Information Practice","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2023-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evidence Based Library and Information Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18438/eblip30378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
A Review of: Federer, L. M. (2022). Long-term availability of data associated with articles in PLOS ONE. PLOS ONE 17(8), Article e0272845. https://doi.org/10.1371/journal.pone.0272845 Objective – To retrieve a range of PLOS ONE data availability statements and quantify their ability to point to the study data efficiently and accurately. Research questions focused on availability over time, availability of URLs versus DOIs, the ability to locate resources using the data availability statement and availability based on data sharing method. Design – Observational study. Setting – PLOS ONE archive. Subjects – A corpus of 47,593 data availability statements from research articles in PLOS ONE between March 1, 2014, and May 31, 2016. Methods – Use of custom R scripts to retrieve 47,593 data availability statements; of these, 6,912 (14.5%) contained at least one URL or DOI. Once these links were extracted, R scripts were run to fetch the resources and record HTTP status codes to determine if the resource was discoverable. To address the potential for the DOI or URL to fetch but not actually contain the appropriate data, the researchers selected at random and manually retrieved the data for 350 URLs and 350 DOIs. Main Results – Of the unique URLs, 75% were able to be automatically retrieved by custom R scripts. In the manual sample of 350 URLs, which was used to test for accuracy of the URLs in containing the data, there was a 78% retrieval rate. Of the unique DOIs, 90% were able to be automatically retrieved by custom R scripts. The manual sample of 350 DOIs had a 98% retrieval rate. Conclusion – DOIs, especially those linked with a repository, had the highest rate of success in retrieving the data attached to the article. While URLs were better than no link at all, URLs are susceptible to content drift and need more management for long-term data availability.