{"title":"Interpretation techniques to explain the output of a spatial land subsidence hazard model in an area with a diverted tributary","authors":"Razieh Seihani , Hamid Gholami , Yahya Esmaeilpour , Alireza Kamali , Maryam Zareh","doi":"10.1016/j.acags.2024.100191","DOIUrl":null,"url":null,"abstract":"<div><p>Due to the nature of black-box machine learning (ML) models used in the spatial modelling field of environmental and natural hazards, the interpretation of predictive model outputs is necessary. For this purpose, we applied four interpretation techniques consisting of interaction plot, permutation feature importance (PFI) measure, shapley additive explanation (SHAP) decision plot, and accumulated local effects (ALE) plot to explain and interpret the output of an ML model applied to map land subsidence (LS) in the Nazdasht plain, Hormozgan province, southern Iran. We applied a stepwise regression (SR) algorithm and five ML models (Cforest (as a conditional random forest), generalized linear model (GLM), multivariate linear regression (MLR), partial least squares (PLS) and extreme gradient boosting (XGBoost)) to select important features and to map the LS hazard, respectively. Thereafter, several interpretation techniques were used to explain the spatial ML hazard model output. Our findings revealed that a GLM model was the most accurate approach to map LS in our study area, and that 24.3% of the total study area had a very high susceptibility to the LS hazard. According to the interpretation techniques, land use, elevation, groundwater level and vegetation were the most important variables controlling the LS hazard and also the most important variables contributing to the model’s output. Overall, human activities, especially the diversion of the route of one of the main tributaries feeding the plain and the recharging of groundwater five decades ago, intensified the current LS occurrence. Therefore, management activities such as water spreading projects upstream of the plain can be useful to mitigate LS occurrence in the plain.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"23 ","pages":"Article 100191"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197424000387/pdfft?md5=aff9aab3e9da8297a983487d668498f5&pid=1-s2.0-S2590197424000387-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197424000387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Due to the nature of black-box machine learning (ML) models used in the spatial modelling field of environmental and natural hazards, the interpretation of predictive model outputs is necessary. For this purpose, we applied four interpretation techniques consisting of interaction plot, permutation feature importance (PFI) measure, shapley additive explanation (SHAP) decision plot, and accumulated local effects (ALE) plot to explain and interpret the output of an ML model applied to map land subsidence (LS) in the Nazdasht plain, Hormozgan province, southern Iran. We applied a stepwise regression (SR) algorithm and five ML models (Cforest (as a conditional random forest), generalized linear model (GLM), multivariate linear regression (MLR), partial least squares (PLS) and extreme gradient boosting (XGBoost)) to select important features and to map the LS hazard, respectively. Thereafter, several interpretation techniques were used to explain the spatial ML hazard model output. Our findings revealed that a GLM model was the most accurate approach to map LS in our study area, and that 24.3% of the total study area had a very high susceptibility to the LS hazard. According to the interpretation techniques, land use, elevation, groundwater level and vegetation were the most important variables controlling the LS hazard and also the most important variables contributing to the model’s output. Overall, human activities, especially the diversion of the route of one of the main tributaries feeding the plain and the recharging of groundwater five decades ago, intensified the current LS occurrence. Therefore, management activities such as water spreading projects upstream of the plain can be useful to mitigate LS occurrence in the plain.
由于环境和自然灾害空间建模领域使用的黑盒机器学习(ML)模型的性质,有必要对预测模型的输出结果进行解释。为此,我们应用了四种解释技术,包括交互图、置换特征重要性(PFI)度量、夏普利加法解释(SHAP)决策图和累积局部效应(ALE)图,以解释和解释应用于绘制伊朗南部霍尔木兹甘省纳兹达什特平原土地沉降(LS)地图的 ML 模型的输出结果。我们采用逐步回归 (SR) 算法和五种 ML 模型(Cforest(作为条件随机森林)、广义线性模型 (GLM)、多元线性回归 (MLR)、偏最小二乘 (PLS) 和极梯度提升 (XGBoost)),分别选择重要特征和绘制 LS 危险图。之后,我们使用了几种解释技术来解释空间 ML 危险模型的输出结果。研究结果表明,GLM 模型是绘制研究区 LS 图最准确的方法,研究区总面积的 24.3%极易受到 LS 的危害。根据解释技术,土地利用、海拔高度、地下水位和植被是控制 LS 危险的最重要变量,也是对模型输出贡献最大的变量。总体而言,人类活动,尤其是五十年前对滋养平原的一条主要支流的改道和地下水的补给,加剧了目前的通量损失。因此,平原上游的水利工程等管理活动可以有效缓解平原的 LS 现象。