{"title":"Do Random Forest-Driven Climate Envelope Models Require Variable Selection? A Case Study on <i>Crustulina guttata</i> (Theridiidae: Araneae).","authors":"Tae-Sung Kwon, Won Il Choi, Min-Jung Kim","doi":"10.3390/insects16020209","DOIUrl":null,"url":null,"abstract":"<p><p>Climate Envelope Models (CEMs) commonly employ 19 bioclimatic variables to predict species distributions, yet selecting which variables to include remains a critical challenge. Although it seems logical to select ecologically relevant variables, the biological responses of many target species are poorly understood. Random Forest (RF), a popular method in CEMs, can effectively handle correlated and nonlinear variables. In light of these strengths, this study explores the full model hypothesis, which involves using all 19 bioclimatic variables in an RF model, using <i>Crustulina guttata</i> (Theridiidae: Araneae) as a test case. Four model variants-a simplified model with two variables, an ecologically selected model with seven variables, a statistically selected model with ten variables, and a full model with nineteen variables-were compared against a thousand randomly assembled models with matching variable counts. All models achieved high performance, though results varied based on the number of variables employed. Notably, the full model consistently produced stronger predictions than models with fewer variables. Moreover, specifying particular variables did not yield a significant advantage over random selections of equally sized sets, indicating that omitting variables may risk the loss of important information. Although the final model suggests that <i>C. guttata</i> may have dispersed beyond its native European range through artificial means, this study examined only a single species. Thus, caution is warranted in generalizing these findings, and additional research is needed to determine whether the full model hypothesis extends to other taxa and environmental contexts. In scenarios where ecological knowledge is limited, however, using all available variables in an RF model may preserve potentially significant predictors and enhance predictive accuracy.</p>","PeriodicalId":13642,"journal":{"name":"Insects","volume":"16 2","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11857067/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Insects","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3390/insects16020209","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENTOMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Climate Envelope Models (CEMs) commonly employ 19 bioclimatic variables to predict species distributions, yet selecting which variables to include remains a critical challenge. Although it seems logical to select ecologically relevant variables, the biological responses of many target species are poorly understood. Random Forest (RF), a popular method in CEMs, can effectively handle correlated and nonlinear variables. In light of these strengths, this study explores the full model hypothesis, which involves using all 19 bioclimatic variables in an RF model, using Crustulina guttata (Theridiidae: Araneae) as a test case. Four model variants-a simplified model with two variables, an ecologically selected model with seven variables, a statistically selected model with ten variables, and a full model with nineteen variables-were compared against a thousand randomly assembled models with matching variable counts. All models achieved high performance, though results varied based on the number of variables employed. Notably, the full model consistently produced stronger predictions than models with fewer variables. Moreover, specifying particular variables did not yield a significant advantage over random selections of equally sized sets, indicating that omitting variables may risk the loss of important information. Although the final model suggests that C. guttata may have dispersed beyond its native European range through artificial means, this study examined only a single species. Thus, caution is warranted in generalizing these findings, and additional research is needed to determine whether the full model hypothesis extends to other taxa and environmental contexts. In scenarios where ecological knowledge is limited, however, using all available variables in an RF model may preserve potentially significant predictors and enhance predictive accuracy.
InsectsAgricultural and Biological Sciences-Insect Science
CiteScore
5.10
自引率
10.00%
发文量
1013
审稿时长
21.77 days
期刊介绍:
Insects (ISSN 2075-4450) is an international, peer-reviewed open access journal of entomology published by MDPI online quarterly. It publishes reviews, research papers and communications related to the biology, physiology and the behavior of insects and arthropods. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files regarding the full details of the experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.