Canran Liu, Graeme Newell, Matt White, Josephine Machunter
{"title":"Improving the estimation of the Boyce index using statistical smoothing methods for evaluating species distribution models with presence-only data","authors":"Canran Liu, Graeme Newell, Matt White, Josephine Machunter","doi":"10.1111/ecog.07218","DOIUrl":null,"url":null,"abstract":"Species distribution models (SDMs) underpin a wide range of decisions concerning biodiversity. Although SDMs can be built using presence-only data, rigorous evaluation of these models remains challenging. One evaluation method is the Boyce index (BI), which uses the relative frequencies between presence sites and background sites within a series of bins or moving windows spanning the entire range of predicted values from the SDM. Obtaining accurate estimates of the BI using these methods relies upon having a large number of presences, which is often not feasible, particularly for rare or restricted species that are often the focus of modelling. Wider application of the BI requires a method that can accurately and reliably estimate the BI using small numbers of presence records. In this study, we investigated the effectiveness of five statistical smoothing methods (i.e. thin plate regression splines, cubic regression splines, B-splines, P-splines and adaptive smoothers) and the mean of these five methods (denoted as ‘mean') to estimate the BI. We simulated 600 species with varying prevalence and built distribution models using random forest and Maxent methods. For training data, we used two levels for the number of presences (NP<sub>train</sub>: 20 and 500), along with 2 × NP<sub>train</sub> and 10000 random points (i.e. random background sites) for each modelling method. We used the number of presences at four levels (NP<sub>bi</sub>: 1000, 200, 50 and 10) to investigate its effect, together with 5000 random points to calculate the BI. Our results indicate that the BI estimates from the binning and moving window methods are severely affected by the decrease of NP<sub>bi</sub>, but all the estimates of the BI from smoothing-based methods were almost always unbiased for realistic situations. Hence, we recommend these methods for estimating the BI for evaluating SDMs when verified absence data are unavailable.","PeriodicalId":51026,"journal":{"name":"Ecography","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecography","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1111/ecog.07218","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIODIVERSITY CONSERVATION","Score":null,"Total":0}
引用次数: 0
Abstract
Species distribution models (SDMs) underpin a wide range of decisions concerning biodiversity. Although SDMs can be built using presence-only data, rigorous evaluation of these models remains challenging. One evaluation method is the Boyce index (BI), which uses the relative frequencies between presence sites and background sites within a series of bins or moving windows spanning the entire range of predicted values from the SDM. Obtaining accurate estimates of the BI using these methods relies upon having a large number of presences, which is often not feasible, particularly for rare or restricted species that are often the focus of modelling. Wider application of the BI requires a method that can accurately and reliably estimate the BI using small numbers of presence records. In this study, we investigated the effectiveness of five statistical smoothing methods (i.e. thin plate regression splines, cubic regression splines, B-splines, P-splines and adaptive smoothers) and the mean of these five methods (denoted as ‘mean') to estimate the BI. We simulated 600 species with varying prevalence and built distribution models using random forest and Maxent methods. For training data, we used two levels for the number of presences (NPtrain: 20 and 500), along with 2 × NPtrain and 10000 random points (i.e. random background sites) for each modelling method. We used the number of presences at four levels (NPbi: 1000, 200, 50 and 10) to investigate its effect, together with 5000 random points to calculate the BI. Our results indicate that the BI estimates from the binning and moving window methods are severely affected by the decrease of NPbi, but all the estimates of the BI from smoothing-based methods were almost always unbiased for realistic situations. Hence, we recommend these methods for estimating the BI for evaluating SDMs when verified absence data are unavailable.
期刊介绍:
ECOGRAPHY publishes exciting, novel, and important articles that significantly advance understanding of ecological or biodiversity patterns in space or time. Papers focusing on conservation or restoration are welcomed, provided they are anchored in ecological theory and convey a general message that goes beyond a single case study. We encourage papers that seek advancing the field through the development and testing of theory or methodology, or by proposing new tools for analysis or interpretation of ecological phenomena. Manuscripts are expected to address general principles in ecology, though they may do so using a specific model system if they adequately frame the problem relative to a generalized ecological question or problem.
Purely descriptive papers are considered only if breaking new ground and/or describing patterns seldom explored. Studies focused on a single species or single location are generally discouraged unless they make a significant contribution to advancing general theory or understanding of biodiversity patterns and processes. Manuscripts merely confirming or marginally extending results of previous work are unlikely to be considered in Ecography.
Papers are judged by virtue of their originality, appeal to general interest, and their contribution to new developments in studies of spatial and temporal ecological patterns. There are no biases with regard to taxon, biome, or biogeographical area.