Hasna Eloudi, Mohammed Hssaisoune, H. Reddad, M. Namous, Maryem Ismaili, S. Krimissa, Mustapha Ouayah, L. Bouchaou
{"title":"基于优化决策树的机器学习模型在沟蚀脆弱性映射中的稳健性","authors":"Hasna Eloudi, Mohammed Hssaisoune, H. Reddad, M. Namous, Maryem Ismaili, S. Krimissa, Mustapha Ouayah, L. Bouchaou","doi":"10.3390/soilsystems7020050","DOIUrl":null,"url":null,"abstract":"Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.","PeriodicalId":21908,"journal":{"name":"Soil Systems","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Robustness of Optimized Decision Tree-Based Machine Learning Models to Map Gully Erosion Vulnerability\",\"authors\":\"Hasna Eloudi, Mohammed Hssaisoune, H. Reddad, M. Namous, Maryem Ismaili, S. Krimissa, Mustapha Ouayah, L. Bouchaou\",\"doi\":\"10.3390/soilsystems7020050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.\",\"PeriodicalId\":21908,\"journal\":{\"name\":\"Soil Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Soil Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/soilsystems7020050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOIL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soil Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/soilsystems7020050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
Robustness of Optimized Decision Tree-Based Machine Learning Models to Map Gully Erosion Vulnerability
Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.