{"title":"An interpretable (explainable) model based on machine learning and SHAP interpretation technique for mapping wind erosion hazard.","authors":"Hamid Gholami, Ehsan Darvishi, Navazollah Moradi, Aliakbar Mohammadifar, Yougui Song, Yue Li, Baicheng Niu, Dimitris Kaskaoutis, Biswajeet Pradhan","doi":"10.1007/s11356-024-35521-x","DOIUrl":null,"url":null,"abstract":"<p><p>Soil erosion by wind poses a significant threat to various regions across the globe, such as drylands in the Middle East and Iran. Wind erosion hazard maps can assist in identifying the regions of highest wind erosion risk and are a valuable tool for the mitigation of its destructive consequences. This study aims to map wind erosion hazards by developing an interpretable (explainable) model based on machine learning (ML) and Shapley additive exPlanation (SHAP) interpretation techniques. Four ML models, namely random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), and quadratic discriminant analysis (QDA) were used. Thirteen features associated with wind erosion were mapped spatially and then subjected to a multivariate adaptive regression spline (MARS) feature selection algorithm, and then, tolerance coefficient (TC) and variance inflation factor (VIF) statistical tests were used to explore multicollinearity among the variables. MARS analysis shows that eight features consisting of elevation (or DEM), soil bulk density, precipitation, aspect, slope, soil sand content, vegetation cover (or NDVI), and lithology were the most effective for wind erosion, while no collinearity existed among these variables. The ML models were used for ranking the effective features, and the research introduces the application of an interpretable ML model for the interpretation of predictive model's output. The ranking of effective features by RF-as the most typical ML model-revealed that elevation and soil bulk density were the two most important features. According to the area under the receiver operating characteristic curve (AUROC) (with a value > 90%) and precision-recall (PR) (with a value > 90%) curves, all four ML models performed with great accuracy. According to the PR curve, the SVM model performed slightly better than others, and its results revealed that 20.9%, 23%, and 16.6% of the total area in Hormozgan Province is characterized by moderate, high, and very high hazard classes to wind erosion, respectively. SHAP revealed that soil sand content and elevation are the most important variables contributing to the predictive model output. Overall, our research is one of the pioneering applications of interpretable ML models in mapping wind erosion hazards in Southern Iran. We recommend that future research should address the aspect of interpretability in order to better understand predictive model outputs.</p>","PeriodicalId":545,"journal":{"name":"Environmental Science and Pollution Research","volume":" ","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science and Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1007/s11356-024-35521-x","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Soil erosion by wind poses a significant threat to various regions across the globe, such as drylands in the Middle East and Iran. Wind erosion hazard maps can assist in identifying the regions of highest wind erosion risk and are a valuable tool for the mitigation of its destructive consequences. This study aims to map wind erosion hazards by developing an interpretable (explainable) model based on machine learning (ML) and Shapley additive exPlanation (SHAP) interpretation techniques. Four ML models, namely random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), and quadratic discriminant analysis (QDA) were used. Thirteen features associated with wind erosion were mapped spatially and then subjected to a multivariate adaptive regression spline (MARS) feature selection algorithm, and then, tolerance coefficient (TC) and variance inflation factor (VIF) statistical tests were used to explore multicollinearity among the variables. MARS analysis shows that eight features consisting of elevation (or DEM), soil bulk density, precipitation, aspect, slope, soil sand content, vegetation cover (or NDVI), and lithology were the most effective for wind erosion, while no collinearity existed among these variables. The ML models were used for ranking the effective features, and the research introduces the application of an interpretable ML model for the interpretation of predictive model's output. The ranking of effective features by RF-as the most typical ML model-revealed that elevation and soil bulk density were the two most important features. According to the area under the receiver operating characteristic curve (AUROC) (with a value > 90%) and precision-recall (PR) (with a value > 90%) curves, all four ML models performed with great accuracy. According to the PR curve, the SVM model performed slightly better than others, and its results revealed that 20.9%, 23%, and 16.6% of the total area in Hormozgan Province is characterized by moderate, high, and very high hazard classes to wind erosion, respectively. SHAP revealed that soil sand content and elevation are the most important variables contributing to the predictive model output. Overall, our research is one of the pioneering applications of interpretable ML models in mapping wind erosion hazards in Southern Iran. We recommend that future research should address the aspect of interpretability in order to better understand predictive model outputs.
期刊介绍:
Environmental Science and Pollution Research (ESPR) serves the international community in all areas of Environmental Science and related subjects with emphasis on chemical compounds. This includes:
- Terrestrial Biology and Ecology
- Aquatic Biology and Ecology
- Atmospheric Chemistry
- Environmental Microbiology/Biobased Energy Sources
- Phytoremediation and Ecosystem Restoration
- Environmental Analyses and Monitoring
- Assessment of Risks and Interactions of Pollutants in the Environment
- Conservation Biology and Sustainable Agriculture
- Impact of Chemicals/Pollutants on Human and Animal Health
It reports from a broad interdisciplinary outlook.