Diana E Bowler, Robin J Boyd, Corey T Callaghan, Robert A Robinson, Nick J B Isaac, Michael J O Pocock
{"title":"Treating gaps and biases in biodiversity data as a missing data problem.","authors":"Diana E Bowler, Robin J Boyd, Corey T Callaghan, Robert A Robinson, Nick J B Isaac, Michael J O Pocock","doi":"10.1111/brv.13127","DOIUrl":null,"url":null,"abstract":"<p><p>Big biodiversity data sets have great potential for monitoring and research because of their large taxonomic, geographic and temporal scope. Such data sets have become especially important for assessing temporal changes in species' populations and distributions. Gaps in the available data, especially spatial and temporal gaps, often mean that the data are not representative of the target population. This hinders drawing large-scale inferences, such as about species' trends, and may lead to misplaced conservation action. Here, we conceptualise gaps in biodiversity monitoring data as a missing data problem, which provides a unifying framework for the challenges and potential solutions across different types of biodiversity data sets. We characterise the typical types of data gaps as different classes of missing data and then use missing data theory to explore the implications for questions about species' trends and factors affecting occurrences/abundances. By using this framework, we show that bias due to data gaps can arise when the factors affecting sampling and/or data availability overlap with those affecting species. But a data set per se is not biased. The outcome depends on the ecological question and statistical approach, which determine choices around which sources of variation are taken into account. We argue that typical approaches to long-term species trend modelling using monitoring data are especially susceptible to data gaps since such models do not tend to account for the factors driving missingness. To identify general solutions to this problem, we review empirical studies and use simulation studies to compare some of the most frequently employed approaches to deal with data gaps, including subsampling, weighting and imputation. All these methods have the potential to reduce bias but may come at the cost of increased uncertainty of parameter estimates. Weighting techniques are arguably the least used so far in ecology and have the potential to reduce both the bias and variance of parameter estimates. Regardless of the method, the ability to reduce bias critically depends on knowledge of, and the availability of data on, the factors creating data gaps. We use this review to outline the necessary considerations when dealing with data gaps at different stages of the data collection and analysis workflow.</p>","PeriodicalId":133,"journal":{"name":"Biological Reviews","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biological Reviews","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/brv.13127","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Big biodiversity data sets have great potential for monitoring and research because of their large taxonomic, geographic and temporal scope. Such data sets have become especially important for assessing temporal changes in species' populations and distributions. Gaps in the available data, especially spatial and temporal gaps, often mean that the data are not representative of the target population. This hinders drawing large-scale inferences, such as about species' trends, and may lead to misplaced conservation action. Here, we conceptualise gaps in biodiversity monitoring data as a missing data problem, which provides a unifying framework for the challenges and potential solutions across different types of biodiversity data sets. We characterise the typical types of data gaps as different classes of missing data and then use missing data theory to explore the implications for questions about species' trends and factors affecting occurrences/abundances. By using this framework, we show that bias due to data gaps can arise when the factors affecting sampling and/or data availability overlap with those affecting species. But a data set per se is not biased. The outcome depends on the ecological question and statistical approach, which determine choices around which sources of variation are taken into account. We argue that typical approaches to long-term species trend modelling using monitoring data are especially susceptible to data gaps since such models do not tend to account for the factors driving missingness. To identify general solutions to this problem, we review empirical studies and use simulation studies to compare some of the most frequently employed approaches to deal with data gaps, including subsampling, weighting and imputation. All these methods have the potential to reduce bias but may come at the cost of increased uncertainty of parameter estimates. Weighting techniques are arguably the least used so far in ecology and have the potential to reduce both the bias and variance of parameter estimates. Regardless of the method, the ability to reduce bias critically depends on knowledge of, and the availability of data on, the factors creating data gaps. We use this review to outline the necessary considerations when dealing with data gaps at different stages of the data collection and analysis workflow.
期刊介绍:
Biological Reviews is a scientific journal that covers a wide range of topics in the biological sciences. It publishes several review articles per issue, which are aimed at both non-specialist biologists and researchers in the field. The articles are scholarly and include extensive bibliographies. Authors are instructed to be aware of the diverse readership and write their articles accordingly.
The reviews in Biological Reviews serve as comprehensive introductions to specific fields, presenting the current state of the art and highlighting gaps in knowledge. Each article can be up to 20,000 words long and includes an abstract, a thorough introduction, and a statement of conclusions.
The journal focuses on publishing synthetic reviews, which are based on existing literature and address important biological questions. These reviews are interesting to a broad readership and are timely, often related to fast-moving fields or new discoveries. A key aspect of a synthetic review is that it goes beyond simply compiling information and instead analyzes the collected data to create a new theoretical or conceptual framework that can significantly impact the field.
Biological Reviews is abstracted and indexed in various databases, including Abstracts on Hygiene & Communicable Diseases, Academic Search, AgBiotech News & Information, AgBiotechNet, AGRICOLA Database, GeoRef, Global Health, SCOPUS, Weed Abstracts, and Reaction Citation Index, among others.