基于优化决策树的机器学习模型在沟蚀脆弱性映射中的稳健性

IF 3.5 Q2 SOIL SCIENCE

Soil Systems Pub Date : 2023-05-16 DOI:10.3390/soilsystems7020050

Hasna Eloudi, Mohammed Hssaisoune, H. Reddad, M. Namous, Maryem Ismaili, S. Krimissa, Mustapha Ouayah, L. Bouchaou

{"title":"基于优化决策树的机器学习模型在沟蚀脆弱性映射中的稳健性","authors":"Hasna Eloudi, Mohammed Hssaisoune, H. Reddad, M. Namous, Maryem Ismaili, S. Krimissa, Mustapha Ouayah, L. Bouchaou","doi":"10.3390/soilsystems7020050","DOIUrl":null,"url":null,"abstract":"Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.","PeriodicalId":21908,"journal":{"name":"Soil Systems","volume":" ","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Robustness of Optimized Decision Tree-Based Machine Learning Models to Map Gully Erosion Vulnerability\",\"authors\":\"Hasna Eloudi, Mohammed Hssaisoune, H. Reddad, M. Namous, Maryem Ismaili, S. Krimissa, Mustapha Ouayah, L. Bouchaou\",\"doi\":\"10.3390/soilsystems7020050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.\",\"PeriodicalId\":21908,\"journal\":{\"name\":\"Soil Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2023-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Soil Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/soilsystems7020050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOIL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soil Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/soilsystems7020050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}

引用次数: 2

摘要

沟壑侵蚀是一个全球性的威胁，具有众多的环境、社会和经济影响。本研究的目的是评估基于决策树原理的6种机器学习集成模型:Random Forest (RF)、C5.0、XGBoost、treebag、Gradient Boosting Machines (GBMs)和Adaboost的性能和鲁棒性，以便在半干旱山地环境中绘制和预测沟谷侵蚀易损区。第一步是准备由217个沟点组成的库存数据。然后将该数据库随机细分为5个百分比的Train/Test(50/50、60/40、70/30、80/20和90/10)，以评估模型的稳定性和稳健性。在此基础上，以17个地质环境变量作为潜在控制因子，并对6个模型的性能进行了评价。结果表明，所使用的所有模型在预测沟蚀脆弱性方面都表现良好。C5.0和RF模型预测效果最佳(AUC分别为90.8和90.1)。然而，根据数据库的随机细分，这些模型表现出较小但明显的不稳定性，在80/20%和70/30%细分时具有高性能。这证明了数据库细化的重要性，以及为了确保高效可靠的输出结果，需要测试各种分裂数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Robustness of Optimized Decision Tree-Based Machine Learning Models to Map Gully Erosion Vulnerability

Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Soil Systems Earth and Planetary Sciences-Earth-Surface Processes

CiteScore

5.30

自引率

5.70%

发文量

审稿时长

11 weeks