随机森林驱动的气候包络模型需要变量选择吗?文章题目刺尾甲壳虫(蛛科:蜘蛛目)的个案研究。

IF 2.7 2区 农林科学 Q1 ENTOMOLOGY
Insects Pub Date : 2025-02-14 DOI:10.3390/insects16020209
Tae-Sung Kwon, Won Il Choi, Min-Jung Kim
{"title":"随机森林驱动的气候包络模型需要变量选择吗?文章题目刺尾甲壳虫(蛛科:蜘蛛目)的个案研究。","authors":"Tae-Sung Kwon, Won Il Choi, Min-Jung Kim","doi":"10.3390/insects16020209","DOIUrl":null,"url":null,"abstract":"<p><p>Climate Envelope Models (CEMs) commonly employ 19 bioclimatic variables to predict species distributions, yet selecting which variables to include remains a critical challenge. Although it seems logical to select ecologically relevant variables, the biological responses of many target species are poorly understood. Random Forest (RF), a popular method in CEMs, can effectively handle correlated and nonlinear variables. In light of these strengths, this study explores the full model hypothesis, which involves using all 19 bioclimatic variables in an RF model, using <i>Crustulina guttata</i> (Theridiidae: Araneae) as a test case. Four model variants-a simplified model with two variables, an ecologically selected model with seven variables, a statistically selected model with ten variables, and a full model with nineteen variables-were compared against a thousand randomly assembled models with matching variable counts. All models achieved high performance, though results varied based on the number of variables employed. Notably, the full model consistently produced stronger predictions than models with fewer variables. Moreover, specifying particular variables did not yield a significant advantage over random selections of equally sized sets, indicating that omitting variables may risk the loss of important information. Although the final model suggests that <i>C. guttata</i> may have dispersed beyond its native European range through artificial means, this study examined only a single species. Thus, caution is warranted in generalizing these findings, and additional research is needed to determine whether the full model hypothesis extends to other taxa and environmental contexts. In scenarios where ecological knowledge is limited, however, using all available variables in an RF model may preserve potentially significant predictors and enhance predictive accuracy.</p>","PeriodicalId":13642,"journal":{"name":"Insects","volume":"16 2","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11857067/pdf/","citationCount":"0","resultStr":"{\"title\":\"Do Random Forest-Driven Climate Envelope Models Require Variable Selection? A Case Study on <i>Crustulina guttata</i> (Theridiidae: Araneae).\",\"authors\":\"Tae-Sung Kwon, Won Il Choi, Min-Jung Kim\",\"doi\":\"10.3390/insects16020209\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Climate Envelope Models (CEMs) commonly employ 19 bioclimatic variables to predict species distributions, yet selecting which variables to include remains a critical challenge. Although it seems logical to select ecologically relevant variables, the biological responses of many target species are poorly understood. Random Forest (RF), a popular method in CEMs, can effectively handle correlated and nonlinear variables. In light of these strengths, this study explores the full model hypothesis, which involves using all 19 bioclimatic variables in an RF model, using <i>Crustulina guttata</i> (Theridiidae: Araneae) as a test case. Four model variants-a simplified model with two variables, an ecologically selected model with seven variables, a statistically selected model with ten variables, and a full model with nineteen variables-were compared against a thousand randomly assembled models with matching variable counts. All models achieved high performance, though results varied based on the number of variables employed. Notably, the full model consistently produced stronger predictions than models with fewer variables. Moreover, specifying particular variables did not yield a significant advantage over random selections of equally sized sets, indicating that omitting variables may risk the loss of important information. Although the final model suggests that <i>C. guttata</i> may have dispersed beyond its native European range through artificial means, this study examined only a single species. Thus, caution is warranted in generalizing these findings, and additional research is needed to determine whether the full model hypothesis extends to other taxa and environmental contexts. In scenarios where ecological knowledge is limited, however, using all available variables in an RF model may preserve potentially significant predictors and enhance predictive accuracy.</p>\",\"PeriodicalId\":13642,\"journal\":{\"name\":\"Insects\",\"volume\":\"16 2\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11857067/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Insects\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.3390/insects16020209\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENTOMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Insects","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3390/insects16020209","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENTOMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

气候包络模型(CEMs)通常使用19个生物气候变量来预测物种分布,但选择包括哪些变量仍然是一个关键的挑战。尽管选择与生态相关的变量似乎合乎逻辑,但许多目标物种的生物学反应尚不清楚。随机森林(Random Forest, RF)可以有效地处理相关变量和非线性变量,是CEMs中常用的一种方法。鉴于这些优势,本研究探索了完整的模型假设,其中包括在RF模型中使用所有19个生物气候变量,并以guttata (Theridiidae: Araneae)为测试案例。四种模型变体——一个有两个变量的简化模型,一个有七个变量的生态选择模型,一个有十个变量的统计选择模型,一个有十九个变量的完整模型——与一千个随机组合的具有匹配变量数的模型进行了比较。所有模型都实现了高性能,尽管结果根据所使用的变量数量而有所不同。值得注意的是,完整的模型始终比变量较少的模型产生更强的预测。此外,指定特定的变量并不会比随机选择相同大小的集合产生显著的优势,这表明省略变量可能会有丢失重要信息的风险。尽管最终的模型表明古塔可能是通过人工手段分散到欧洲本土之外的,但这项研究只调查了一个物种。因此,在推广这些发现时需要谨慎,需要进一步的研究来确定完整的模型假设是否适用于其他分类群和环境背景。然而,在生态知识有限的情况下,在RF模型中使用所有可用变量可以保留潜在的重要预测因子并提高预测准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Do Random Forest-Driven Climate Envelope Models Require Variable Selection? A Case Study on Crustulina guttata (Theridiidae: Araneae).

Climate Envelope Models (CEMs) commonly employ 19 bioclimatic variables to predict species distributions, yet selecting which variables to include remains a critical challenge. Although it seems logical to select ecologically relevant variables, the biological responses of many target species are poorly understood. Random Forest (RF), a popular method in CEMs, can effectively handle correlated and nonlinear variables. In light of these strengths, this study explores the full model hypothesis, which involves using all 19 bioclimatic variables in an RF model, using Crustulina guttata (Theridiidae: Araneae) as a test case. Four model variants-a simplified model with two variables, an ecologically selected model with seven variables, a statistically selected model with ten variables, and a full model with nineteen variables-were compared against a thousand randomly assembled models with matching variable counts. All models achieved high performance, though results varied based on the number of variables employed. Notably, the full model consistently produced stronger predictions than models with fewer variables. Moreover, specifying particular variables did not yield a significant advantage over random selections of equally sized sets, indicating that omitting variables may risk the loss of important information. Although the final model suggests that C. guttata may have dispersed beyond its native European range through artificial means, this study examined only a single species. Thus, caution is warranted in generalizing these findings, and additional research is needed to determine whether the full model hypothesis extends to other taxa and environmental contexts. In scenarios where ecological knowledge is limited, however, using all available variables in an RF model may preserve potentially significant predictors and enhance predictive accuracy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Insects
Insects Agricultural and Biological Sciences-Insect Science
CiteScore
5.10
自引率
10.00%
发文量
1013
审稿时长
21.77 days
期刊介绍: Insects (ISSN 2075-4450) is an international, peer-reviewed open access journal of entomology published by MDPI online quarterly. It publishes reviews, research papers and communications related to the biology, physiology and the behavior of insects and arthropods. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files regarding the full details of the experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信