{"title":"Environmental variables controlling soil thickness across elevation zones in the eastern Himalayas","authors":"Xin Zhang , Jianrong Fan , Hongjin Chen","doi":"10.1016/j.catena.2025.109526","DOIUrl":null,"url":null,"abstract":"<div><div>Soil thickness (ST) is a critical indicator of ecological processes such as water regulation, nutrient cycling, carbon storage, and vegetation restoration. However, digital mapping of ST is often hindered by prediction bias and difficulties in identifying driving factors, largely due to the prevalence of right-censored data. Here, we investigate the eastern Himalayas (500–7000 m a.s.l.) using 130 field profiles and a suite of environmental variables spanning lithology, vegetation, climate, topography, spectral indices, and soil properties. We propose a composite framework that integrates the inverse probability of censoring weighted random forest (IPCW-RF) with Shapley additive explanations (SHAP) analysis. The IPCW component corrects censoring bias, the RF model provides high-precision spatial prediction, and SHAP yields quantitative insights into the mechanisms shaping ST at global, local, and spatial scales. The IPCW-RF model, trained with climate, topographic, and Sentinel-2 spectral variables, achieved strong predictive performance (R<sup>2</sup> = 0.55; MAE = 18.09 cm; RMSE = 21.59 cm) under 5-fold nearest-neighbor distance matching cross-validation (CV). Areas with high ST values were concentrated in the Yarlung Tsangpo River valley and on gently sloping plateau surfaces. The relationship between ST and elevation followed a nonlinear but structured pattern: ST decreased significantly between 500 and 2500 m, fluctuated between positive and negative correlations from 2500 to 5000 m, and declined again above 5000 m. SHAP analysis revealed an elevation-dependent contribution of environmental factors, with Band 8 emerging as the dominant predictor overall. At low to mid elevations (500–2500 m), vegetation played the primary role; at mid to high elevations (2500–4500 m), both vegetation and topography were influential; and at high elevations (>4500 m), topographic controls predominated. This study demonstrates the integration of an interpretable machine-learning framework with censored data, offering new insights into soil formation processes and improving spatial prediction of ST in complex plateau terrains.</div></div>","PeriodicalId":9801,"journal":{"name":"Catena","volume":"261 ","pages":"Article 109526"},"PeriodicalIF":5.7000,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Catena","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0341816225008288","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Soil thickness (ST) is a critical indicator of ecological processes such as water regulation, nutrient cycling, carbon storage, and vegetation restoration. However, digital mapping of ST is often hindered by prediction bias and difficulties in identifying driving factors, largely due to the prevalence of right-censored data. Here, we investigate the eastern Himalayas (500–7000 m a.s.l.) using 130 field profiles and a suite of environmental variables spanning lithology, vegetation, climate, topography, spectral indices, and soil properties. We propose a composite framework that integrates the inverse probability of censoring weighted random forest (IPCW-RF) with Shapley additive explanations (SHAP) analysis. The IPCW component corrects censoring bias, the RF model provides high-precision spatial prediction, and SHAP yields quantitative insights into the mechanisms shaping ST at global, local, and spatial scales. The IPCW-RF model, trained with climate, topographic, and Sentinel-2 spectral variables, achieved strong predictive performance (R2 = 0.55; MAE = 18.09 cm; RMSE = 21.59 cm) under 5-fold nearest-neighbor distance matching cross-validation (CV). Areas with high ST values were concentrated in the Yarlung Tsangpo River valley and on gently sloping plateau surfaces. The relationship between ST and elevation followed a nonlinear but structured pattern: ST decreased significantly between 500 and 2500 m, fluctuated between positive and negative correlations from 2500 to 5000 m, and declined again above 5000 m. SHAP analysis revealed an elevation-dependent contribution of environmental factors, with Band 8 emerging as the dominant predictor overall. At low to mid elevations (500–2500 m), vegetation played the primary role; at mid to high elevations (2500–4500 m), both vegetation and topography were influential; and at high elevations (>4500 m), topographic controls predominated. This study demonstrates the integration of an interpretable machine-learning framework with censored data, offering new insights into soil formation processes and improving spatial prediction of ST in complex plateau terrains.
期刊介绍:
Catena publishes papers describing original field and laboratory investigations and reviews on geoecology and landscape evolution with emphasis on interdisciplinary aspects of soil science, hydrology and geomorphology. It aims to disseminate new knowledge and foster better understanding of the physical environment, of evolutionary sequences that have resulted in past and current landscapes, and of the natural processes that are likely to determine the fate of our terrestrial environment.
Papers within any one of the above topics are welcome provided they are of sufficiently wide interest and relevance.