{"title":"An enhanced soil salinity estimation method for arid regions using multisource remote sensing data and advanced feature selection","authors":"Aihepa Aihaiti , Ilyas Nurmemet , Xinru Yu , Yilizhati Aili , Shiqin Li , Xiaobo Lv , Yu Qin","doi":"10.1016/j.catena.2025.109116","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate soil salinity monitoring is crucial for sustainable soil use and management. While most existing studies rely on optical remote sensing for salinity estimation, the potential of polarimetric synthetic aperture radar (PolSAR) data, particularly its polarimetric decomposition characteristics, remains underexplored. This study focuses on the Yutian Oasis in southern Xinjiang, China, to investigate the potential of PolSAR data for estimating soil salinity in arid regions through the integration of multi-source remote sensing data (including RADARSAT-2 C-band SAR, Sentinel-2, and topographic data). From the multi-source dataset, 121 features were extracted, and correlation analysis identified 52 variables significantly correlated (<em>P</em> < 0.05) with soil electrical conductivity (EC). These variables were then further screened using three feature selection algorithms: Recursive Feature Elimination (RFE), Boruta, and Variable Importance in Projection (VIP), to mitigate high-dimensionality and collinearity. Subsequently, three machine learning models—Multi-Layer Perceptron (MLP), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)—were employed to construct soil salinity inversion models. The results revealed that the Boruta-MLP model outperformed other strategies in both the calibration and validation phases, demonstrating strong generalization capabilities. For validation, the Boruta-MLP model achieved an R<sup>2</sup> of 0.819, with RMSE and MAE values of 5.767 and 3.800, respectively. Variable sensitivity analysis indicated that key SAR features—including the backscatter cross-polarization ratio (σ<sup>0</sup>_VV/σ<sup>0</sup>_VH), radar vegetation index (RVI_σ<sup>0</sup>), and volume scattering index (VSI_σ<sup>0</sup>)—along with SAR polarimetric decomposition components (Alpha, Entropy, MF4CF_theta_FP) and texture features (Contrast_σ<sup>0</sup>_VH, Dissimilarity_σ<sup>0</sup>_VH, and Homogeneity_σ<sup>0</sup>_VH), play crucial roles in soil salinity estimation. This research underscores the critical role of SAR data and advanced feature selection in soil salinity estimation, offering a robust framework for arid region salinity mapping through multi-source data integration and machine learning optimization.</div></div>","PeriodicalId":9801,"journal":{"name":"Catena","volume":"256 ","pages":"Article 109116"},"PeriodicalIF":5.4000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Catena","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0341816225004187","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate soil salinity monitoring is crucial for sustainable soil use and management. While most existing studies rely on optical remote sensing for salinity estimation, the potential of polarimetric synthetic aperture radar (PolSAR) data, particularly its polarimetric decomposition characteristics, remains underexplored. This study focuses on the Yutian Oasis in southern Xinjiang, China, to investigate the potential of PolSAR data for estimating soil salinity in arid regions through the integration of multi-source remote sensing data (including RADARSAT-2 C-band SAR, Sentinel-2, and topographic data). From the multi-source dataset, 121 features were extracted, and correlation analysis identified 52 variables significantly correlated (P < 0.05) with soil electrical conductivity (EC). These variables were then further screened using three feature selection algorithms: Recursive Feature Elimination (RFE), Boruta, and Variable Importance in Projection (VIP), to mitigate high-dimensionality and collinearity. Subsequently, three machine learning models—Multi-Layer Perceptron (MLP), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)—were employed to construct soil salinity inversion models. The results revealed that the Boruta-MLP model outperformed other strategies in both the calibration and validation phases, demonstrating strong generalization capabilities. For validation, the Boruta-MLP model achieved an R2 of 0.819, with RMSE and MAE values of 5.767 and 3.800, respectively. Variable sensitivity analysis indicated that key SAR features—including the backscatter cross-polarization ratio (σ0_VV/σ0_VH), radar vegetation index (RVI_σ0), and volume scattering index (VSI_σ0)—along with SAR polarimetric decomposition components (Alpha, Entropy, MF4CF_theta_FP) and texture features (Contrast_σ0_VH, Dissimilarity_σ0_VH, and Homogeneity_σ0_VH), play crucial roles in soil salinity estimation. This research underscores the critical role of SAR data and advanced feature selection in soil salinity estimation, offering a robust framework for arid region salinity mapping through multi-source data integration and machine learning optimization.
期刊介绍:
Catena publishes papers describing original field and laboratory investigations and reviews on geoecology and landscape evolution with emphasis on interdisciplinary aspects of soil science, hydrology and geomorphology. It aims to disseminate new knowledge and foster better understanding of the physical environment, of evolutionary sequences that have resulted in past and current landscapes, and of the natural processes that are likely to determine the fate of our terrestrial environment.
Papers within any one of the above topics are welcome provided they are of sufficiently wide interest and relevance.