在预测树木个体死亡率方面，机器学习是否优于逻辑回归？

IF 7.3 2区环境科学与生态学 Q1 ECOLOGY

Ecological Informatics Pub Date : 2025-04-12 DOI:10.1016/j.ecoinf.2025.103140

Aitor Vázquez-Veloso , Astor Toraño Caicoya , Felipe Bravo , Peter Biber , Enno Uhl , Hans Pretzsch

{"title":"在预测树木个体死亡率方面，机器学习是否优于逻辑回归？","authors":"Aitor Vázquez-Veloso , Astor Toraño Caicoya , Felipe Bravo , Peter Biber , Enno Uhl , Hans Pretzsch","doi":"10.1016/j.ecoinf.2025.103140","DOIUrl":null,"url":null,"abstract":"<div><div>Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought, and pathogens drive tree mortality, but the underlying mechanism is challenging to model. The current environmental changes are even complicating model approaches as they influence and alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns that can model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality. However, different machine learning (ML) alternatives already applied to other forest modeling topics can be used for this purpose. Here, we compare the performance of five different ML algorithms (Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine) against Logistic binomial Regression in individual tree mortality classification under 40 different case studies and a cross-validation case study. The data used corresponds to Norway spruce long-term experimental plots, which have a total of 75,522 tree records and a 10.28 % mortality rate on average. Through different case studies, when more variables were used, general performance improved as expected, while more extensive datasets decreased the performance level of the algorithms. Performance was also higher when plots remained without management compared to thinned ones. Random Forest outperformed the other algorithms in all the cases except cross-validation, where it was the weaker one. Our results demonstrate the potential of ML in assessing tree mortality. When the model application is not clearly defined and/or model interpretability is needed, Logistic binomial Regression is still the best tool for evaluating individual tree mortality.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"88 ","pages":"Article 103140"},"PeriodicalIF":7.3000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Does machine learning outperform logistic regression in predicting individual tree mortality?\",\"authors\":\"Aitor Vázquez-Veloso , Astor Toraño Caicoya , Felipe Bravo , Peter Biber , Enno Uhl , Hans Pretzsch\",\"doi\":\"10.1016/j.ecoinf.2025.103140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought, and pathogens drive tree mortality, but the underlying mechanism is challenging to model. The current environmental changes are even complicating model approaches as they influence and alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns that can model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality. However, different machine learning (ML) alternatives already applied to other forest modeling topics can be used for this purpose. Here, we compare the performance of five different ML algorithms (Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine) against Logistic binomial Regression in individual tree mortality classification under 40 different case studies and a cross-validation case study. The data used corresponds to Norway spruce long-term experimental plots, which have a total of 75,522 tree records and a 10.28 % mortality rate on average. Through different case studies, when more variables were used, general performance improved as expected, while more extensive datasets decreased the performance level of the algorithms. Performance was also higher when plots remained without management compared to thinned ones. Random Forest outperformed the other algorithms in all the cases except cross-validation, where it was the weaker one. Our results demonstrate the potential of ML in assessing tree mortality. When the model application is not clearly defined and/or model interpretability is needed, Logistic binomial Regression is still the best tool for evaluating individual tree mortality.</div></div>\",\"PeriodicalId\":51024,\"journal\":{\"name\":\"Ecological Informatics\",\"volume\":\"88 \",\"pages\":\"Article 103140\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Informatics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574954125001499\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125001499","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

树木死亡是森林动力学中的一个重要过程，也是森林生长模型和模拟器的一个关键组成部分。竞争、干旱和病原体等因素导致树木死亡，但潜在的机制很难建立模型。当前的环境变化甚至使模型方法复杂化，因为它们影响和改变了与死亡率有关的所有因素。然而，创新的分类算法可以深入到数据中，找到可以建模甚至解释它们之间关系的模式。我们使用Logistic二项回归作为预测树木个体死亡率的参考算法。然而，已经应用于其他森林建模主题的不同机器学习（ML）替代方案可以用于此目的。在这里，我们比较了五种不同的机器学习算法（决策树、随机森林、朴素贝叶斯、k近邻和支持向量机）在40个不同案例研究和交叉验证案例研究中对单个树死亡率分类的逻辑二项回归的性能。所使用的数据与挪威云杉长期试验田相对应，该试验田共有75,522棵树记录，平均死亡率为10.28%。通过不同的案例研究，当使用更多的变量时，一般性能如预期的那样提高，而更广泛的数据集则降低了算法的性能水平。与稀疏的地块相比，没有管理的地块的表现也更高。除了交叉验证之外，随机森林在所有情况下都优于其他算法，在交叉验证中它是较弱的一个。我们的结果证明了ML在评估树木死亡率方面的潜力。当模型应用不明确和/或需要模型可解释性时，Logistic二项回归仍然是评估单个树木死亡率的最佳工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Does machine learning outperform logistic regression in predicting individual tree mortality?

Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought, and pathogens drive tree mortality, but the underlying mechanism is challenging to model. The current environmental changes are even complicating model approaches as they influence and alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns that can model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality. However, different machine learning (ML) alternatives already applied to other forest modeling topics can be used for this purpose. Here, we compare the performance of five different ML algorithms (Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine) against Logistic binomial Regression in individual tree mortality classification under 40 different case studies and a cross-validation case study. The data used corresponds to Norway spruce long-term experimental plots, which have a total of 75,522 tree records and a 10.28 % mortality rate on average. Through different case studies, when more variables were used, general performance improved as expected, while more extensive datasets decreased the performance level of the algorithms. Performance was also higher when plots remained without management compared to thinned ones. Random Forest outperformed the other algorithms in all the cases except cross-validation, where it was the weaker one. Our results demonstrate the potential of ML in assessing tree mortality. When the model application is not clearly defined and/or model interpretability is needed, Logistic binomial Regression is still the best tool for evaluating individual tree mortality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ecological Informatics 环境科学-生态学

CiteScore

8.30

自引率

11.80%

发文量

346

审稿时长

46 days

期刊介绍： The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.