Interpretable machine learning on large samples for supporting runoff estimation in ungauged basins

IF 5.9 1区 地球科学 Q1 ENGINEERING, CIVIL
Yuanhao Xu , Kairong Lin , Caihong Hu , Shuli Wang , Qiang Wu , Jingwen Zhang , Mingzhong Xiao , Yufu Luo
{"title":"Interpretable machine learning on large samples for supporting runoff estimation in ungauged basins","authors":"Yuanhao Xu ,&nbsp;Kairong Lin ,&nbsp;Caihong Hu ,&nbsp;Shuli Wang ,&nbsp;Qiang Wu ,&nbsp;Jingwen Zhang ,&nbsp;Mingzhong Xiao ,&nbsp;Yufu Luo","doi":"10.1016/j.jhydrol.2024.131598","DOIUrl":null,"url":null,"abstract":"<div><p>The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing Nash-Sutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.</p></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":null,"pages":null},"PeriodicalIF":5.9000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169424009946","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0

Abstract

The distribution of flowmeter data and basin characteristic information exhibits substantial disparities, with most flow observations being recorded at a limited number of well-monitored locations. The perennial challenge of achieving reliable and robust hydrological modeling in ungauged catchments through regionalization has persisted. The increasing availability of large-scale hydrological datasets, coupled with recent advancements in machine learning techniques, offers new opportunities to explore patterns of association between basin attributes and hydrological parameters to enhance streamflow predictions. A novel parameter cross-regional transfer approach based on interpretable machine learning (XGBoost) is proposed to accurately predict runoff processes in ungauged regions by leveraging well-trained models across numerous basins within climate zones. We validate the effectiveness of this framework across 5,764 basins in a large sample dataset (Caravan), employing Nash-Sutcliffe Efficiency (NSE), RMSE and bias to assess performance. And a comparison is made with deep transfer learning based on LSTM and Transformer. Results indicate that the proposed method achieves NSE values exceeding 0.2 for 75 % of the ungauged basins, demonstrating superior performance and more stable accuracy compared to pure deep learning models, owing to its incorporation of physical constraints. Furthermore, the response of parameters to basin attributes within different climatic zones in the large-sample context is elucidated through SHAP values, enriching the understanding of hydrological features through data-driven inverse inference. These findings underscore the capability of interpretable machine learning to leverage hydro-physical regularities extracted from abundant basin features, thereby enhancing the accuracy of runoff predictions in ungauged regions.

在大样本上进行可解释的机器学习,支持无测站流域的径流估算
流量计数据和流域特征信息的分布存在很大差异,大多数流量观测数据都是在少数监测良好的地点记录的。如何通过区域化方法在无测站流域建立可靠、稳健的水文模型,是一项长期存在的挑战。大规模水文数据集的可用性不断提高,再加上机器学习技术的最新进展,为探索流域属性与水文参数之间的关联模式提供了新的机遇,从而提高了对溪流的预测能力。我们提出了一种基于可解释机器学习(XGBoost)的新型参数跨区域转移方法,通过利用气候带内众多流域中训练有素的模型,准确预测无测站地区的径流过程。我们利用纳什-苏克里夫效率(NSE)、均方根误差(RMSE)和偏差评估性能,在大型样本数据集(Caravan)的 5,764 个流域中验证了该框架的有效性。并与基于 LSTM 和 Transformer 的深度迁移学习进行了比较。结果表明,与纯粹的深度学习模型相比,所提出的方法在 75% 的无测站流域的 NSE 值超过了 0.2,表现出更优越的性能和更稳定的精度,这得益于其结合了物理约束条件。此外,在大样本背景下,通过 SHAP 值阐明了参数对不同气候带内流域属性的响应,通过数据驱动的逆推理丰富了对水文特征的理解。这些发现强调了可解释机器学习利用从丰富的流域特征中提取的水文物理规律性的能力,从而提高了无测站地区径流预测的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Hydrology
Journal of Hydrology 地学-地球科学综合
CiteScore
11.00
自引率
12.50%
发文量
1309
审稿时长
7.5 months
期刊介绍: The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信