Yuanhao Xu , Kairong Lin , Caihong Hu , Shuli Wang , Qiang Wu , Jingwen Zhang , Mingzhong Xiao , Yufu Luo
{"title":"在大样本上进行可解释的机器学习,支持无测站流域的径流估算","authors":"Yuanhao Xu , Kairong Lin , Caihong Hu , Shuli Wang , Qiang Wu , Jingwen Zhang , Mingzhong Xiao , Yufu Luo","doi":"10.1016/j.jhydrol.2024.131598","DOIUrl":null,"url":null,"abstract":"<div><p>The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing Nash-Sutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.</p></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":null,"pages":null},"PeriodicalIF":5.9000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable machine learning on large samples for supporting runoff estimation in ungauged basins\",\"authors\":\"Yuanhao Xu , Kairong Lin , Caihong Hu , Shuli Wang , Qiang Wu , Jingwen Zhang , Mingzhong Xiao , Yufu Luo\",\"doi\":\"10.1016/j.jhydrol.2024.131598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing Nash-Sutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.</p></div>\",\"PeriodicalId\":362,\"journal\":{\"name\":\"Journal of Hydrology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2024-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hydrology\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022169424009946\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169424009946","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
Interpretable machine learning on large samples for supporting runoff estimation in ungauged basins
The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing Nash-Sutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.
期刊介绍:
The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.