Do Random Forest-Driven Climate Envelope Models Require Variable Selection? A Case Study on Crustulina guttata (Theridiidae: Araneae).

IF 2.7 2区 农林科学 Q1 ENTOMOLOGY
Insects Pub Date : 2025-02-14 DOI:10.3390/insects16020209
Tae-Sung Kwon, Won Il Choi, Min-Jung Kim
{"title":"Do Random Forest-Driven Climate Envelope Models Require Variable Selection? A Case Study on <i>Crustulina guttata</i> (Theridiidae: Araneae).","authors":"Tae-Sung Kwon, Won Il Choi, Min-Jung Kim","doi":"10.3390/insects16020209","DOIUrl":null,"url":null,"abstract":"<p><p>Climate Envelope Models (CEMs) commonly employ 19 bioclimatic variables to predict species distributions, yet selecting which variables to include remains a critical challenge. Although it seems logical to select ecologically relevant variables, the biological responses of many target species are poorly understood. Random Forest (RF), a popular method in CEMs, can effectively handle correlated and nonlinear variables. In light of these strengths, this study explores the full model hypothesis, which involves using all 19 bioclimatic variables in an RF model, using <i>Crustulina guttata</i> (Theridiidae: Araneae) as a test case. Four model variants-a simplified model with two variables, an ecologically selected model with seven variables, a statistically selected model with ten variables, and a full model with nineteen variables-were compared against a thousand randomly assembled models with matching variable counts. All models achieved high performance, though results varied based on the number of variables employed. Notably, the full model consistently produced stronger predictions than models with fewer variables. Moreover, specifying particular variables did not yield a significant advantage over random selections of equally sized sets, indicating that omitting variables may risk the loss of important information. Although the final model suggests that <i>C. guttata</i> may have dispersed beyond its native European range through artificial means, this study examined only a single species. Thus, caution is warranted in generalizing these findings, and additional research is needed to determine whether the full model hypothesis extends to other taxa and environmental contexts. In scenarios where ecological knowledge is limited, however, using all available variables in an RF model may preserve potentially significant predictors and enhance predictive accuracy.</p>","PeriodicalId":13642,"journal":{"name":"Insects","volume":"16 2","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11857067/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Insects","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3390/insects16020209","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENTOMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Climate Envelope Models (CEMs) commonly employ 19 bioclimatic variables to predict species distributions, yet selecting which variables to include remains a critical challenge. Although it seems logical to select ecologically relevant variables, the biological responses of many target species are poorly understood. Random Forest (RF), a popular method in CEMs, can effectively handle correlated and nonlinear variables. In light of these strengths, this study explores the full model hypothesis, which involves using all 19 bioclimatic variables in an RF model, using Crustulina guttata (Theridiidae: Araneae) as a test case. Four model variants-a simplified model with two variables, an ecologically selected model with seven variables, a statistically selected model with ten variables, and a full model with nineteen variables-were compared against a thousand randomly assembled models with matching variable counts. All models achieved high performance, though results varied based on the number of variables employed. Notably, the full model consistently produced stronger predictions than models with fewer variables. Moreover, specifying particular variables did not yield a significant advantage over random selections of equally sized sets, indicating that omitting variables may risk the loss of important information. Although the final model suggests that C. guttata may have dispersed beyond its native European range through artificial means, this study examined only a single species. Thus, caution is warranted in generalizing these findings, and additional research is needed to determine whether the full model hypothesis extends to other taxa and environmental contexts. In scenarios where ecological knowledge is limited, however, using all available variables in an RF model may preserve potentially significant predictors and enhance predictive accuracy.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Insects
Insects Agricultural and Biological Sciences-Insect Science
CiteScore
5.10
自引率
10.00%
发文量
1013
审稿时长
21.77 days
期刊介绍: Insects (ISSN 2075-4450) is an international, peer-reviewed open access journal of entomology published by MDPI online quarterly. It publishes reviews, research papers and communications related to the biology, physiology and the behavior of insects and arthropods. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files regarding the full details of the experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信