Jiaming Rong , Jiangyuan Zeng , Kun-Shan Chen , Hongliang Ma , Pengfei Shi , Husi Letu , Xiang Zhang , Xihui Gu , Haiyun Bi , Chunlin Zhang
{"title":"协同机器学习和插值方法:全球尺度卫星土壤水分空隙填充的叠加框架","authors":"Jiaming Rong , Jiangyuan Zeng , Kun-Shan Chen , Hongliang Ma , Pengfei Shi , Husi Letu , Xiang Zhang , Xihui Gu , Haiyun Bi , Chunlin Zhang","doi":"10.1016/j.rse.2025.115040","DOIUrl":null,"url":null,"abstract":"<div><div>Satellite-derived soil moisture (SM) products frequently encounter extensive data gaps that significantly limit their practical utility, necessitating the development of robust gap-filling techniques to generate SM datasets with enhanced accuracy and continuous spatiotemporal coverage. Existing studies have typically relied on single machine learning or interpolation methods to fill SM gaps at regional scales. Machine learning approaches excel at filling missing values in large regions but tend to smooth out important local SM features, while the interpolation methods perform well in areas with low levels of missing data, but exhibit significant uncertainty in regions with large amounts of continuously missing data. These two kinds of approaches show potential complementarity and could together contribute to a more robust gap-filling method, which however have rarely been investigated. To fill this research gap, we established a novel SM gap-filling method by synergizing the advantages of machine learning for large-scale gap filling and the excellent gap-filling performance of interpolation in localized areas using the Stacking method at a global scale. The proposed approach integrates four base models including three machine learning techniques namely Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Feed-forward Neural Network (FNN), and one interpolation method known as Natural Neighbor Interpolation (NNI), and employs the Least Absolute Shrinkage and Selection Operator (LASSO) as the meta model. We compared the Stacking method and individual approaches in filling ESA CCI missing SM data, and validated the gap-filled SM using extensive ground SM from 1086 sites worldwide. The results indicate: (1) RF performs the best among the six selected machine learning methods, and its overall accuracy at a global scale is higher than that of interpolation methods. The feature importance analysis by SHapley Additive exPlanations (SHAP) indicates ERA5 SM, NDVI, and Global Aridity Index have high importance in the RF gap-filling model; (2) NNI is the best performing approach among the four selected interpolation methods, and it demonstrates better performance than machine learning methods in localized areas where the original SM data is relatively abundant; (3) Stacking is an effective method for SM gap filling on a global scale, with an averaged ubRMSE of 0.017 m<sup>3</sup>/m<sup>3</sup>, RMSE of 0.022 m<sup>3</sup>/m<sup>3</sup>, Bias of 0.006 m<sup>3</sup>/m<sup>3</sup>, and <em>R</em> of 0.87 against the original ESA CCI SM, and it reduces the RMSE by 0.009 m<sup>3</sup>/m<sup>3</sup>, ubRMSE by 0.006 m<sup>3</sup>/m<sup>3</sup>, and improves <em>R</em> by 0.15 relative to the individual best-performing RF method; (4) The gap-filled SM shows an improved skill than the original ESA CCI SM against global distributed ground SM, with Stacking displaying the lowest ubRMSE of 0.057 m<sup>3</sup>/m<sup>3</sup> and the highest <em>R</em> of 0.63. The proposed Stacking method opens new avenues to fill the gaps in various satellite SM datasets.</div></div>","PeriodicalId":417,"journal":{"name":"Remote Sensing of Environment","volume":"331 ","pages":"Article 115040"},"PeriodicalIF":11.4000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synergizing machine learning and interpolation methods: A Stacking framework for global-scale satellite soil moisture gap filling\",\"authors\":\"Jiaming Rong , Jiangyuan Zeng , Kun-Shan Chen , Hongliang Ma , Pengfei Shi , Husi Letu , Xiang Zhang , Xihui Gu , Haiyun Bi , Chunlin Zhang\",\"doi\":\"10.1016/j.rse.2025.115040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Satellite-derived soil moisture (SM) products frequently encounter extensive data gaps that significantly limit their practical utility, necessitating the development of robust gap-filling techniques to generate SM datasets with enhanced accuracy and continuous spatiotemporal coverage. Existing studies have typically relied on single machine learning or interpolation methods to fill SM gaps at regional scales. Machine learning approaches excel at filling missing values in large regions but tend to smooth out important local SM features, while the interpolation methods perform well in areas with low levels of missing data, but exhibit significant uncertainty in regions with large amounts of continuously missing data. These two kinds of approaches show potential complementarity and could together contribute to a more robust gap-filling method, which however have rarely been investigated. To fill this research gap, we established a novel SM gap-filling method by synergizing the advantages of machine learning for large-scale gap filling and the excellent gap-filling performance of interpolation in localized areas using the Stacking method at a global scale. The proposed approach integrates four base models including three machine learning techniques namely Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Feed-forward Neural Network (FNN), and one interpolation method known as Natural Neighbor Interpolation (NNI), and employs the Least Absolute Shrinkage and Selection Operator (LASSO) as the meta model. We compared the Stacking method and individual approaches in filling ESA CCI missing SM data, and validated the gap-filled SM using extensive ground SM from 1086 sites worldwide. The results indicate: (1) RF performs the best among the six selected machine learning methods, and its overall accuracy at a global scale is higher than that of interpolation methods. The feature importance analysis by SHapley Additive exPlanations (SHAP) indicates ERA5 SM, NDVI, and Global Aridity Index have high importance in the RF gap-filling model; (2) NNI is the best performing approach among the four selected interpolation methods, and it demonstrates better performance than machine learning methods in localized areas where the original SM data is relatively abundant; (3) Stacking is an effective method for SM gap filling on a global scale, with an averaged ubRMSE of 0.017 m<sup>3</sup>/m<sup>3</sup>, RMSE of 0.022 m<sup>3</sup>/m<sup>3</sup>, Bias of 0.006 m<sup>3</sup>/m<sup>3</sup>, and <em>R</em> of 0.87 against the original ESA CCI SM, and it reduces the RMSE by 0.009 m<sup>3</sup>/m<sup>3</sup>, ubRMSE by 0.006 m<sup>3</sup>/m<sup>3</sup>, and improves <em>R</em> by 0.15 relative to the individual best-performing RF method; (4) The gap-filled SM shows an improved skill than the original ESA CCI SM against global distributed ground SM, with Stacking displaying the lowest ubRMSE of 0.057 m<sup>3</sup>/m<sup>3</sup> and the highest <em>R</em> of 0.63. The proposed Stacking method opens new avenues to fill the gaps in various satellite SM datasets.</div></div>\",\"PeriodicalId\":417,\"journal\":{\"name\":\"Remote Sensing of Environment\",\"volume\":\"331 \",\"pages\":\"Article 115040\"},\"PeriodicalIF\":11.4000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Remote Sensing of Environment\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0034425725004444\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing of Environment","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0034425725004444","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Synergizing machine learning and interpolation methods: A Stacking framework for global-scale satellite soil moisture gap filling
Satellite-derived soil moisture (SM) products frequently encounter extensive data gaps that significantly limit their practical utility, necessitating the development of robust gap-filling techniques to generate SM datasets with enhanced accuracy and continuous spatiotemporal coverage. Existing studies have typically relied on single machine learning or interpolation methods to fill SM gaps at regional scales. Machine learning approaches excel at filling missing values in large regions but tend to smooth out important local SM features, while the interpolation methods perform well in areas with low levels of missing data, but exhibit significant uncertainty in regions with large amounts of continuously missing data. These two kinds of approaches show potential complementarity and could together contribute to a more robust gap-filling method, which however have rarely been investigated. To fill this research gap, we established a novel SM gap-filling method by synergizing the advantages of machine learning for large-scale gap filling and the excellent gap-filling performance of interpolation in localized areas using the Stacking method at a global scale. The proposed approach integrates four base models including three machine learning techniques namely Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Feed-forward Neural Network (FNN), and one interpolation method known as Natural Neighbor Interpolation (NNI), and employs the Least Absolute Shrinkage and Selection Operator (LASSO) as the meta model. We compared the Stacking method and individual approaches in filling ESA CCI missing SM data, and validated the gap-filled SM using extensive ground SM from 1086 sites worldwide. The results indicate: (1) RF performs the best among the six selected machine learning methods, and its overall accuracy at a global scale is higher than that of interpolation methods. The feature importance analysis by SHapley Additive exPlanations (SHAP) indicates ERA5 SM, NDVI, and Global Aridity Index have high importance in the RF gap-filling model; (2) NNI is the best performing approach among the four selected interpolation methods, and it demonstrates better performance than machine learning methods in localized areas where the original SM data is relatively abundant; (3) Stacking is an effective method for SM gap filling on a global scale, with an averaged ubRMSE of 0.017 m3/m3, RMSE of 0.022 m3/m3, Bias of 0.006 m3/m3, and R of 0.87 against the original ESA CCI SM, and it reduces the RMSE by 0.009 m3/m3, ubRMSE by 0.006 m3/m3, and improves R by 0.15 relative to the individual best-performing RF method; (4) The gap-filled SM shows an improved skill than the original ESA CCI SM against global distributed ground SM, with Stacking displaying the lowest ubRMSE of 0.057 m3/m3 and the highest R of 0.63. The proposed Stacking method opens new avenues to fill the gaps in various satellite SM datasets.
期刊介绍:
Remote Sensing of Environment (RSE) serves the Earth observation community by disseminating results on the theory, science, applications, and technology that contribute to advancing the field of remote sensing. With a thoroughly interdisciplinary approach, RSE encompasses terrestrial, oceanic, and atmospheric sensing.
The journal emphasizes biophysical and quantitative approaches to remote sensing at local to global scales, covering a diverse range of applications and techniques.
RSE serves as a vital platform for the exchange of knowledge and advancements in the dynamic field of remote sensing.