Yakun Wang , Qiuru Zhang , Shikun Sun , Yifei Yao , Xiaotao Hu , Shibiao Cai , Hanbo Wang
{"title":"Estimation of soil moisture profiles in ungauged stations by hybridizing sequential data assimilation and machine learning","authors":"Yakun Wang , Qiuru Zhang , Shikun Sun , Yifei Yao , Xiaotao Hu , Shibiao Cai , Hanbo Wang","doi":"10.1016/j.jhydrol.2025.134292","DOIUrl":null,"url":null,"abstract":"<div><div>Although soil moisture content (SMC) is crucial for understanding land–atmosphere interface interactions, there is still a paucity of regional-scale long-term high resolution SMC datasets. How to deduce regional continuous SMC with discrete point-scale observations remains a prevalent and challenging issue. Existing physical models often struggle with parameter acquisition and high computational costs, whereas purely data-driven machine learning (ML) models may perform poorly outside its calibration range due to the lack of physical constraints. This study proposed a sequential hybrid method (mRestart-EnKF-ML) that combined the modified restart ensemble Kalman filter (mRestart-EnKF) with ML to estimate SMC in ungauged stations using historical information from adjacent available stations. With the aid of a series of real-world cases, we demonstrated the ability, and the challenge as well, of retrieving SMC in ungauged stations sequentially. The results showed that Kalman update improved the reliability of ML modeling by updating soil hydraulic parameters in real-time, thereby enhancing the overall estimation accuracy of SMC, especially for the surface-layer SMC. In contrast to purely data-driven models, the proposed mRestart-EnKF-ML significantly reduced the RMSE of SMC retrievals by means of both expanding the training dataset and introducing physical constraints into ML models. The impacts of input features on mRestart-EnKF-ML for SMC estimation exhibited significant spatial heterogeneity and align with physical processes based on the SHAP analysis. Training data with denser vertical spatial resolution can implicitly offer more accurate physical knowledge like mass conservation, thus ensuring the proposed method’s robustness in various application scenarios. The performance of the mRestart-EnKF-ML was significantly influenced by observation error settings, station-specific characteristics, and depth-dependent responses, with optimal error configurations varying by station type and depth playing key roles in shaping error-accuracy relationships.</div></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":"663 ","pages":"Article 134292"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169425016324","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
Abstract
Although soil moisture content (SMC) is crucial for understanding land–atmosphere interface interactions, there is still a paucity of regional-scale long-term high resolution SMC datasets. How to deduce regional continuous SMC with discrete point-scale observations remains a prevalent and challenging issue. Existing physical models often struggle with parameter acquisition and high computational costs, whereas purely data-driven machine learning (ML) models may perform poorly outside its calibration range due to the lack of physical constraints. This study proposed a sequential hybrid method (mRestart-EnKF-ML) that combined the modified restart ensemble Kalman filter (mRestart-EnKF) with ML to estimate SMC in ungauged stations using historical information from adjacent available stations. With the aid of a series of real-world cases, we demonstrated the ability, and the challenge as well, of retrieving SMC in ungauged stations sequentially. The results showed that Kalman update improved the reliability of ML modeling by updating soil hydraulic parameters in real-time, thereby enhancing the overall estimation accuracy of SMC, especially for the surface-layer SMC. In contrast to purely data-driven models, the proposed mRestart-EnKF-ML significantly reduced the RMSE of SMC retrievals by means of both expanding the training dataset and introducing physical constraints into ML models. The impacts of input features on mRestart-EnKF-ML for SMC estimation exhibited significant spatial heterogeneity and align with physical processes based on the SHAP analysis. Training data with denser vertical spatial resolution can implicitly offer more accurate physical knowledge like mass conservation, thus ensuring the proposed method’s robustness in various application scenarios. The performance of the mRestart-EnKF-ML was significantly influenced by observation error settings, station-specific characteristics, and depth-dependent responses, with optimal error configurations varying by station type and depth playing key roles in shaping error-accuracy relationships.
期刊介绍:
The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.