协同机器学习和插值方法:全球尺度卫星土壤水分空隙填充的叠加框架

IF 11.4 1区 地球科学 Q1 ENVIRONMENTAL SCIENCES
Jiaming Rong , Jiangyuan Zeng , Kun-Shan Chen , Hongliang Ma , Pengfei Shi , Husi Letu , Xiang Zhang , Xihui Gu , Haiyun Bi , Chunlin Zhang
{"title":"协同机器学习和插值方法:全球尺度卫星土壤水分空隙填充的叠加框架","authors":"Jiaming Rong ,&nbsp;Jiangyuan Zeng ,&nbsp;Kun-Shan Chen ,&nbsp;Hongliang Ma ,&nbsp;Pengfei Shi ,&nbsp;Husi Letu ,&nbsp;Xiang Zhang ,&nbsp;Xihui Gu ,&nbsp;Haiyun Bi ,&nbsp;Chunlin Zhang","doi":"10.1016/j.rse.2025.115040","DOIUrl":null,"url":null,"abstract":"<div><div>Satellite-derived soil moisture (SM) products frequently encounter extensive data gaps that significantly limit their practical utility, necessitating the development of robust gap-filling techniques to generate SM datasets with enhanced accuracy and continuous spatiotemporal coverage. Existing studies have typically relied on single machine learning or interpolation methods to fill SM gaps at regional scales. Machine learning approaches excel at filling missing values in large regions but tend to smooth out important local SM features, while the interpolation methods perform well in areas with low levels of missing data, but exhibit significant uncertainty in regions with large amounts of continuously missing data. These two kinds of approaches show potential complementarity and could together contribute to a more robust gap-filling method, which however have rarely been investigated. To fill this research gap, we established a novel SM gap-filling method by synergizing the advantages of machine learning for large-scale gap filling and the excellent gap-filling performance of interpolation in localized areas using the Stacking method at a global scale. The proposed approach integrates four base models including three machine learning techniques namely Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Feed-forward Neural Network (FNN), and one interpolation method known as Natural Neighbor Interpolation (NNI), and employs the Least Absolute Shrinkage and Selection Operator (LASSO) as the meta model. We compared the Stacking method and individual approaches in filling ESA CCI missing SM data, and validated the gap-filled SM using extensive ground SM from 1086 sites worldwide. The results indicate: (1) RF performs the best among the six selected machine learning methods, and its overall accuracy at a global scale is higher than that of interpolation methods. The feature importance analysis by SHapley Additive exPlanations (SHAP) indicates ERA5 SM, NDVI, and Global Aridity Index have high importance in the RF gap-filling model; (2) NNI is the best performing approach among the four selected interpolation methods, and it demonstrates better performance than machine learning methods in localized areas where the original SM data is relatively abundant; (3) Stacking is an effective method for SM gap filling on a global scale, with an averaged ubRMSE of 0.017 m<sup>3</sup>/m<sup>3</sup>, RMSE of 0.022 m<sup>3</sup>/m<sup>3</sup>, Bias of 0.006 m<sup>3</sup>/m<sup>3</sup>, and <em>R</em> of 0.87 against the original ESA CCI SM, and it reduces the RMSE by 0.009 m<sup>3</sup>/m<sup>3</sup>, ubRMSE by 0.006 m<sup>3</sup>/m<sup>3</sup>, and improves <em>R</em> by 0.15 relative to the individual best-performing RF method; (4) The gap-filled SM shows an improved skill than the original ESA CCI SM against global distributed ground SM, with Stacking displaying the lowest ubRMSE of 0.057 m<sup>3</sup>/m<sup>3</sup> and the highest <em>R</em> of 0.63. The proposed Stacking method opens new avenues to fill the gaps in various satellite SM datasets.</div></div>","PeriodicalId":417,"journal":{"name":"Remote Sensing of Environment","volume":"331 ","pages":"Article 115040"},"PeriodicalIF":11.4000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synergizing machine learning and interpolation methods: A Stacking framework for global-scale satellite soil moisture gap filling\",\"authors\":\"Jiaming Rong ,&nbsp;Jiangyuan Zeng ,&nbsp;Kun-Shan Chen ,&nbsp;Hongliang Ma ,&nbsp;Pengfei Shi ,&nbsp;Husi Letu ,&nbsp;Xiang Zhang ,&nbsp;Xihui Gu ,&nbsp;Haiyun Bi ,&nbsp;Chunlin Zhang\",\"doi\":\"10.1016/j.rse.2025.115040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Satellite-derived soil moisture (SM) products frequently encounter extensive data gaps that significantly limit their practical utility, necessitating the development of robust gap-filling techniques to generate SM datasets with enhanced accuracy and continuous spatiotemporal coverage. Existing studies have typically relied on single machine learning or interpolation methods to fill SM gaps at regional scales. Machine learning approaches excel at filling missing values in large regions but tend to smooth out important local SM features, while the interpolation methods perform well in areas with low levels of missing data, but exhibit significant uncertainty in regions with large amounts of continuously missing data. These two kinds of approaches show potential complementarity and could together contribute to a more robust gap-filling method, which however have rarely been investigated. To fill this research gap, we established a novel SM gap-filling method by synergizing the advantages of machine learning for large-scale gap filling and the excellent gap-filling performance of interpolation in localized areas using the Stacking method at a global scale. The proposed approach integrates four base models including three machine learning techniques namely Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Feed-forward Neural Network (FNN), and one interpolation method known as Natural Neighbor Interpolation (NNI), and employs the Least Absolute Shrinkage and Selection Operator (LASSO) as the meta model. We compared the Stacking method and individual approaches in filling ESA CCI missing SM data, and validated the gap-filled SM using extensive ground SM from 1086 sites worldwide. The results indicate: (1) RF performs the best among the six selected machine learning methods, and its overall accuracy at a global scale is higher than that of interpolation methods. The feature importance analysis by SHapley Additive exPlanations (SHAP) indicates ERA5 SM, NDVI, and Global Aridity Index have high importance in the RF gap-filling model; (2) NNI is the best performing approach among the four selected interpolation methods, and it demonstrates better performance than machine learning methods in localized areas where the original SM data is relatively abundant; (3) Stacking is an effective method for SM gap filling on a global scale, with an averaged ubRMSE of 0.017 m<sup>3</sup>/m<sup>3</sup>, RMSE of 0.022 m<sup>3</sup>/m<sup>3</sup>, Bias of 0.006 m<sup>3</sup>/m<sup>3</sup>, and <em>R</em> of 0.87 against the original ESA CCI SM, and it reduces the RMSE by 0.009 m<sup>3</sup>/m<sup>3</sup>, ubRMSE by 0.006 m<sup>3</sup>/m<sup>3</sup>, and improves <em>R</em> by 0.15 relative to the individual best-performing RF method; (4) The gap-filled SM shows an improved skill than the original ESA CCI SM against global distributed ground SM, with Stacking displaying the lowest ubRMSE of 0.057 m<sup>3</sup>/m<sup>3</sup> and the highest <em>R</em> of 0.63. The proposed Stacking method opens new avenues to fill the gaps in various satellite SM datasets.</div></div>\",\"PeriodicalId\":417,\"journal\":{\"name\":\"Remote Sensing of Environment\",\"volume\":\"331 \",\"pages\":\"Article 115040\"},\"PeriodicalIF\":11.4000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Remote Sensing of Environment\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0034425725004444\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing of Environment","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0034425725004444","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

卫星土壤湿度(SM)产品经常遇到大量的数据空白,这极大地限制了其实际应用,因此需要开发强大的空白填补技术来生成具有更高精度和连续时空覆盖的SM数据集。现有的研究通常依赖于单一的机器学习或插值方法来填补区域尺度上的SM空白。机器学习方法擅长填充大区域的缺失值,但往往会平滑掉重要的局部SM特征,而插值方法在缺失数据水平较低的区域表现良好,但在大量连续缺失数据的区域表现出显著的不确定性。这两种方法显示出潜在的互补性,可以共同促进更健壮的间隙填充方法,但很少研究。为了填补这一研究空白,我们将机器学习在大规模空白填充方面的优势与在全局尺度上使用Stacking方法在局部区域内插值的优异性能相结合,建立了一种新的SM空白填充方法。该方法集成了随机森林(RF)、梯度增强决策树(GBDT)和前馈神经网络(FNN)三种机器学习技术和自然邻居插值(NNI)四种基本模型,并采用最小绝对收缩和选择算子(LASSO)作为元模型。我们比较了叠加法和单独方法在填补ESA CCI缺失SM数据方面的差异,并利用全球1086个站点的大量地面SM数据验证了空白填充SM。结果表明:(1)在选择的6种机器学习方法中,射频算法表现最好,其在全局尺度上的整体精度高于插值方法。SHapley加性解释(SHapley Additive explanation, SHAP)特征重要性分析表明,ERA5 SM、NDVI和Global arid Index在RF gap-fill模型中具有较高的重要性;(2)在四种插值方法中,NNI方法表现最好,在原始SM数据相对丰富的局部区域,NNI方法的插值性能优于机器学习方法;(3)叠加是一种有效的全球尺度上的SM缺口填充方法,与原始ESA CCI SM相比,其平均ubRMSE为0.017 m3/m3, RMSE为0.022 m3/m3, Bias为0.006 m3/m3, R为0.87,相对于单项表现最佳的RF方法,其RMSE降低0.009 m3/m3, ubRMSE降低0.006 m3/m3, R提高0.15;(4)与原始ESA CCI SM相比,填隙SM对全球分布地面SM的能力有所提高,其中Stacking的ubRMSE最低,为0.057 m3/m3, R最高,为0.63。所提出的叠加方法为填补各种卫星SM数据集的空白开辟了新的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Synergizing machine learning and interpolation methods: A Stacking framework for global-scale satellite soil moisture gap filling
Satellite-derived soil moisture (SM) products frequently encounter extensive data gaps that significantly limit their practical utility, necessitating the development of robust gap-filling techniques to generate SM datasets with enhanced accuracy and continuous spatiotemporal coverage. Existing studies have typically relied on single machine learning or interpolation methods to fill SM gaps at regional scales. Machine learning approaches excel at filling missing values in large regions but tend to smooth out important local SM features, while the interpolation methods perform well in areas with low levels of missing data, but exhibit significant uncertainty in regions with large amounts of continuously missing data. These two kinds of approaches show potential complementarity and could together contribute to a more robust gap-filling method, which however have rarely been investigated. To fill this research gap, we established a novel SM gap-filling method by synergizing the advantages of machine learning for large-scale gap filling and the excellent gap-filling performance of interpolation in localized areas using the Stacking method at a global scale. The proposed approach integrates four base models including three machine learning techniques namely Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Feed-forward Neural Network (FNN), and one interpolation method known as Natural Neighbor Interpolation (NNI), and employs the Least Absolute Shrinkage and Selection Operator (LASSO) as the meta model. We compared the Stacking method and individual approaches in filling ESA CCI missing SM data, and validated the gap-filled SM using extensive ground SM from 1086 sites worldwide. The results indicate: (1) RF performs the best among the six selected machine learning methods, and its overall accuracy at a global scale is higher than that of interpolation methods. The feature importance analysis by SHapley Additive exPlanations (SHAP) indicates ERA5 SM, NDVI, and Global Aridity Index have high importance in the RF gap-filling model; (2) NNI is the best performing approach among the four selected interpolation methods, and it demonstrates better performance than machine learning methods in localized areas where the original SM data is relatively abundant; (3) Stacking is an effective method for SM gap filling on a global scale, with an averaged ubRMSE of 0.017 m3/m3, RMSE of 0.022 m3/m3, Bias of 0.006 m3/m3, and R of 0.87 against the original ESA CCI SM, and it reduces the RMSE by 0.009 m3/m3, ubRMSE by 0.006 m3/m3, and improves R by 0.15 relative to the individual best-performing RF method; (4) The gap-filled SM shows an improved skill than the original ESA CCI SM against global distributed ground SM, with Stacking displaying the lowest ubRMSE of 0.057 m3/m3 and the highest R of 0.63. The proposed Stacking method opens new avenues to fill the gaps in various satellite SM datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Remote Sensing of Environment
Remote Sensing of Environment 环境科学-成像科学与照相技术
CiteScore
25.10
自引率
8.90%
发文量
455
审稿时长
53 days
期刊介绍: Remote Sensing of Environment (RSE) serves the Earth observation community by disseminating results on the theory, science, applications, and technology that contribute to advancing the field of remote sensing. With a thoroughly interdisciplinary approach, RSE encompasses terrestrial, oceanic, and atmospheric sensing. The journal emphasizes biophysical and quantitative approaches to remote sensing at local to global scales, covering a diverse range of applications and techniques. RSE serves as a vital platform for the exchange of knowledge and advancements in the dynamic field of remote sensing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信