迈向可靠的生物多样性数据集参考

Ecol. Informatics Pub Date : 2020-01-03 DOI:10.32942/osf.io/mysfp

Michael Elliott, J. Poelen, J. Fortes

{"title":"迈向可靠的生物多样性数据集参考","authors":"Michael Elliott, J. Poelen, J. Fortes","doi":"10.32942/osf.io/mysfp","DOIUrl":null,"url":null,"abstract":"No systematic approach has yet been adopted to reliably reference and provide access to digital biodiversity datasets. Based on accumulated evidence, we argue that location-based identifiers such as URLs are not sufficient to ensure long-term data access. We introduce a method that uses dedicated data observatories to evaluate long-term URL reliability.From March 2019 through May 2020, we took periodic inventories of the data provided to major biodiversity aggregators, including GBIF, iDigBio, DataONE, and BHL by accessing the URL-based dataset references from which the aggregators retrieve data. Over the period of observation, we found that, for the URL-based dataset references available in each of the aggregators' data provider registries, 5% to 70% of URLs were intermittently or consistently unresponsive, 0% to 66% produced unstable content, and 20% to 75% became either unresponsive or unstable.We propose the use of cryptographic hashing to generate content-based identifiers that can reliably reference datasets. We show that content-based identifiers facilitate decentralized archival and reliable distribution of biodiversity datasets to enable long-term accessibility of the referenced datasets.","PeriodicalId":178797,"journal":{"name":"Ecol. Informatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Toward reliable biodiversity dataset references\",\"authors\":\"Michael Elliott, J. Poelen, J. Fortes\",\"doi\":\"10.32942/osf.io/mysfp\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"No systematic approach has yet been adopted to reliably reference and provide access to digital biodiversity datasets. Based on accumulated evidence, we argue that location-based identifiers such as URLs are not sufficient to ensure long-term data access. We introduce a method that uses dedicated data observatories to evaluate long-term URL reliability.From March 2019 through May 2020, we took periodic inventories of the data provided to major biodiversity aggregators, including GBIF, iDigBio, DataONE, and BHL by accessing the URL-based dataset references from which the aggregators retrieve data. Over the period of observation, we found that, for the URL-based dataset references available in each of the aggregators' data provider registries, 5% to 70% of URLs were intermittently or consistently unresponsive, 0% to 66% produced unstable content, and 20% to 75% became either unresponsive or unstable.We propose the use of cryptographic hashing to generate content-based identifiers that can reliably reference datasets. We show that content-based identifiers facilitate decentralized archival and reliable distribution of biodiversity datasets to enable long-term accessibility of the referenced datasets.\",\"PeriodicalId\":178797,\"journal\":{\"name\":\"Ecol. Informatics\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecol. Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32942/osf.io/mysfp\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecol. Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32942/osf.io/mysfp","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

目前还没有采用系统的方法来可靠地参考和提供对数字生物多样性数据集的访问。基于积累的证据，我们认为基于位置的标识符(如url)不足以确保长期数据访问。我们介绍了一种使用专用数据观测站来评估长期URL可靠性的方法。从2019年3月到2020年5月，我们通过访问聚合器从中检索数据的基于url的数据集引用，对提供给主要生物多样性聚合器(包括GBIF、iDigBio、DataONE和BHL)的数据进行了定期盘点。在观察期间，我们发现，对于每个聚合器的数据提供者注册表中可用的基于url的数据集引用，5%至70%的url间歇性或持续无响应，0%至66%产生不稳定的内容，20%至75%变得无响应或不稳定。我们建议使用加密散列来生成能够可靠地引用数据集的基于内容的标识符。我们表明，基于内容的标识符促进了生物多样性数据集的分散存档和可靠分布，从而使参考数据集能够长期访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Toward reliable biodiversity dataset references

No systematic approach has yet been adopted to reliably reference and provide access to digital biodiversity datasets. Based on accumulated evidence, we argue that location-based identifiers such as URLs are not sufficient to ensure long-term data access. We introduce a method that uses dedicated data observatories to evaluate long-term URL reliability.From March 2019 through May 2020, we took periodic inventories of the data provided to major biodiversity aggregators, including GBIF, iDigBio, DataONE, and BHL by accessing the URL-based dataset references from which the aggregators retrieve data. Over the period of observation, we found that, for the URL-based dataset references available in each of the aggregators' data provider registries, 5% to 70% of URLs were intermittently or consistently unresponsive, 0% to 66% produced unstable content, and 20% to 75% became either unresponsive or unstable.We propose the use of cryptographic hashing to generate content-based identifiers that can reliably reference datasets. We show that content-based identifiers facilitate decentralized archival and reliable distribution of biodiversity datasets to enable long-term accessibility of the referenced datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ecol. Informatics

自引率

0.00%

发文量