在预测树木个体死亡率方面,机器学习是否优于逻辑回归?

IF 7.3 2区 环境科学与生态学 Q1 ECOLOGY
Aitor Vázquez-Veloso , Astor Toraño Caicoya , Felipe Bravo , Peter Biber , Enno Uhl , Hans Pretzsch
{"title":"在预测树木个体死亡率方面,机器学习是否优于逻辑回归?","authors":"Aitor Vázquez-Veloso ,&nbsp;Astor Toraño Caicoya ,&nbsp;Felipe Bravo ,&nbsp;Peter Biber ,&nbsp;Enno Uhl ,&nbsp;Hans Pretzsch","doi":"10.1016/j.ecoinf.2025.103140","DOIUrl":null,"url":null,"abstract":"<div><div>Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought, and pathogens drive tree mortality, but the underlying mechanism is challenging to model. The current environmental changes are even complicating model approaches as they influence and alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns that can model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality. However, different machine learning (ML) alternatives already applied to other forest modeling topics can be used for this purpose. Here, we compare the performance of five different ML algorithms (Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine) against Logistic binomial Regression in individual tree mortality classification under 40 different case studies and a cross-validation case study. The data used corresponds to Norway spruce long-term experimental plots, which have a total of 75,522 tree records and a 10.28 % mortality rate on average. Through different case studies, when more variables were used, general performance improved as expected, while more extensive datasets decreased the performance level of the algorithms. Performance was also higher when plots remained without management compared to thinned ones. Random Forest outperformed the other algorithms in all the cases except cross-validation, where it was the weaker one. Our results demonstrate the potential of ML in assessing tree mortality. When the model application is not clearly defined and/or model interpretability is needed, Logistic binomial Regression is still the best tool for evaluating individual tree mortality.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"88 ","pages":"Article 103140"},"PeriodicalIF":7.3000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Does machine learning outperform logistic regression in predicting individual tree mortality?\",\"authors\":\"Aitor Vázquez-Veloso ,&nbsp;Astor Toraño Caicoya ,&nbsp;Felipe Bravo ,&nbsp;Peter Biber ,&nbsp;Enno Uhl ,&nbsp;Hans Pretzsch\",\"doi\":\"10.1016/j.ecoinf.2025.103140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought, and pathogens drive tree mortality, but the underlying mechanism is challenging to model. The current environmental changes are even complicating model approaches as they influence and alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns that can model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality. However, different machine learning (ML) alternatives already applied to other forest modeling topics can be used for this purpose. Here, we compare the performance of five different ML algorithms (Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine) against Logistic binomial Regression in individual tree mortality classification under 40 different case studies and a cross-validation case study. The data used corresponds to Norway spruce long-term experimental plots, which have a total of 75,522 tree records and a 10.28 % mortality rate on average. Through different case studies, when more variables were used, general performance improved as expected, while more extensive datasets decreased the performance level of the algorithms. Performance was also higher when plots remained without management compared to thinned ones. Random Forest outperformed the other algorithms in all the cases except cross-validation, where it was the weaker one. Our results demonstrate the potential of ML in assessing tree mortality. When the model application is not clearly defined and/or model interpretability is needed, Logistic binomial Regression is still the best tool for evaluating individual tree mortality.</div></div>\",\"PeriodicalId\":51024,\"journal\":{\"name\":\"Ecological Informatics\",\"volume\":\"88 \",\"pages\":\"Article 103140\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Informatics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574954125001499\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125001499","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

树木死亡是森林动力学中的一个重要过程,也是森林生长模型和模拟器的一个关键组成部分。竞争、干旱和病原体等因素导致树木死亡,但潜在的机制很难建立模型。当前的环境变化甚至使模型方法复杂化,因为它们影响和改变了与死亡率有关的所有因素。然而,创新的分类算法可以深入到数据中,找到可以建模甚至解释它们之间关系的模式。我们使用Logistic二项回归作为预测树木个体死亡率的参考算法。然而,已经应用于其他森林建模主题的不同机器学习(ML)替代方案可以用于此目的。在这里,我们比较了五种不同的机器学习算法(决策树、随机森林、朴素贝叶斯、k近邻和支持向量机)在40个不同案例研究和交叉验证案例研究中对单个树死亡率分类的逻辑二项回归的性能。所使用的数据与挪威云杉长期试验田相对应,该试验田共有75,522棵树记录,平均死亡率为10.28%。通过不同的案例研究,当使用更多的变量时,一般性能如预期的那样提高,而更广泛的数据集则降低了算法的性能水平。与稀疏的地块相比,没有管理的地块的表现也更高。除了交叉验证之外,随机森林在所有情况下都优于其他算法,在交叉验证中它是较弱的一个。我们的结果证明了ML在评估树木死亡率方面的潜力。当模型应用不明确和/或需要模型可解释性时,Logistic二项回归仍然是评估单个树木死亡率的最佳工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Does machine learning outperform logistic regression in predicting individual tree mortality?
Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought, and pathogens drive tree mortality, but the underlying mechanism is challenging to model. The current environmental changes are even complicating model approaches as they influence and alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns that can model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality. However, different machine learning (ML) alternatives already applied to other forest modeling topics can be used for this purpose. Here, we compare the performance of five different ML algorithms (Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine) against Logistic binomial Regression in individual tree mortality classification under 40 different case studies and a cross-validation case study. The data used corresponds to Norway spruce long-term experimental plots, which have a total of 75,522 tree records and a 10.28 % mortality rate on average. Through different case studies, when more variables were used, general performance improved as expected, while more extensive datasets decreased the performance level of the algorithms. Performance was also higher when plots remained without management compared to thinned ones. Random Forest outperformed the other algorithms in all the cases except cross-validation, where it was the weaker one. Our results demonstrate the potential of ML in assessing tree mortality. When the model application is not clearly defined and/or model interpretability is needed, Logistic binomial Regression is still the best tool for evaluating individual tree mortality.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Ecological Informatics
Ecological Informatics 环境科学-生态学
CiteScore
8.30
自引率
11.80%
发文量
346
审稿时长
46 days
期刊介绍: The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信