{"title":"On the Long Tails of Specimen Data","authors":"Arturo H. Ariño","doi":"10.3897/biss.7.112151","DOIUrl":null,"url":null,"abstract":"A recent article by K.R. Johnson and I.F.P. Owens in Science (Johnson and Owens 2023) suggested that the 73 main natural history museums around the world collectively hold over 1 billion records of accessioned \"specimens\" (taken as collection units), a result remarkably close to, but obtained through a completely different method from, research published a decade earlier by A.H. Ariño in Biodiversity Informatics (Ariño 2010). Both sets of approaches have benefitted from information available at the Global Biodiversity Information Facility (GBIF), which in the intervening years has grown by an order of magnitude, although mostly through observation-based occurrences rather than through accretion of specimen records in collections. When comparing the estimated size of collections and the amount of digital data from those collections, there is still a huge gap, as there was then. Digitization efforts have been progressing, but they are still far from reaching the goal of bringing information about all specimens into the digital domain.\n While the larger institutions may doubtlessly have greater overall resources to try and make their data available than smaller institutions, how do they compare in terms of data mobilization and sharing? Not surprisingly, the distribution of the collection sizes shows a long tail of small institutions that, nonetheless, are also embarking on digitization efforts. Will this long tail of science actually manage to have all their biodiversity data available sooner than the larger institutions? It is becoming more widely recognized that data usability is predicated on data becoming findable, accessible, interoperable and reusable (FAIR, Wilkinson et al. 2016). What could be the consequences of having a data availability bias towards having many tiny collections available for ready use, rather than a much smaller (although surely very significant) fraction of larger collections of a comparable type?\n This presentation explores and compares the distribution of potential versus readily available data in 2010 and in 2023, examines what trends might exist in the race to universal specimen data availability, and whether the digitization efforts might be better targeted to achieve greater overall scientific benefit.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"241 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.112151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A recent article by K.R. Johnson and I.F.P. Owens in Science (Johnson and Owens 2023) suggested that the 73 main natural history museums around the world collectively hold over 1 billion records of accessioned "specimens" (taken as collection units), a result remarkably close to, but obtained through a completely different method from, research published a decade earlier by A.H. Ariño in Biodiversity Informatics (Ariño 2010). Both sets of approaches have benefitted from information available at the Global Biodiversity Information Facility (GBIF), which in the intervening years has grown by an order of magnitude, although mostly through observation-based occurrences rather than through accretion of specimen records in collections. When comparing the estimated size of collections and the amount of digital data from those collections, there is still a huge gap, as there was then. Digitization efforts have been progressing, but they are still far from reaching the goal of bringing information about all specimens into the digital domain.
While the larger institutions may doubtlessly have greater overall resources to try and make their data available than smaller institutions, how do they compare in terms of data mobilization and sharing? Not surprisingly, the distribution of the collection sizes shows a long tail of small institutions that, nonetheless, are also embarking on digitization efforts. Will this long tail of science actually manage to have all their biodiversity data available sooner than the larger institutions? It is becoming more widely recognized that data usability is predicated on data becoming findable, accessible, interoperable and reusable (FAIR, Wilkinson et al. 2016). What could be the consequences of having a data availability bias towards having many tiny collections available for ready use, rather than a much smaller (although surely very significant) fraction of larger collections of a comparable type?
This presentation explores and compares the distribution of potential versus readily available data in 2010 and in 2023, examines what trends might exist in the race to universal specimen data availability, and whether the digitization efforts might be better targeted to achieve greater overall scientific benefit.