解释多步前每日流量预测的机器学习模型

IF 3.2 3区地球科学 Q1 Environmental Science

Hydrological Processes Pub Date : 2025-05-19 DOI:10.1002/hyp.70163

Ruonan Hao, Huaxiang Yan

{"title":"解释多步前每日流量预测的机器学习模型","authors":"Ruonan Hao, Huaxiang Yan","doi":"10.1002/hyp.70163","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Streamflow forecasting using interpretable machine learning methods (MLs) for exploring runoff processes has received a lot of attention. However, exploring multi-step ahead daily streamflow forecasting considering antecedent streamflow as an input for various interpretable MLs is very limited. Thus, three interpretable MLs for daily streamflow forecasting in the Huaihe River basin of China during 2002–2020, including eXtreme Gradient Boosting (XGBoost), long short-term memory neural network (LSTM) and convolutional neural network (CNN) with SHapley Additive exPlanations (SHAP) method, were implemented to study the role of potential controlling factors, including antecedent streamflow, soil moisture and vegetation growth, in runoff processes at lead times of 0–6 days. The forecasting performances decreased with lead times. Specifically, the LSTM model performed best at lead times of 0–3 days, followed by CNN and XGBoost. CNN was superior to LSTM and XGBoost models when the lead time was greater than 3 days. The optimal forecasting performances were 0.71–0.97, 311.45–674.27 m<sup>3</sup>/s, 0.84–0.97 and 0.75–0.97 according to Nash-Sutclife efficiency, root-mean-square error, correlation coefficient and Kling-Gupta efficiency, respectively. The interpretable results varied across different MLs and at different lead times. The antecedent streamflow consistently dominated the runoff processes, particularly in the LSTM and XGBoost models. However, the significant role of soil moisture at the depth of 28–100 cm and leaf area index for low vegetation gradually emerged with increased lead times for CNN models, even outranking the importance of antecedent streamflow. Furthermore, the interpretability demonstrated by the optimal machine learning models was validated through the infiltration model and uncertainty analysis. Overall, interpretable machine learning has great potential to enhance our understanding of basin-scale runoff processes.</p>\n </div>","PeriodicalId":13189,"journal":{"name":"Hydrological Processes","volume":"39 5","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Interpreting Machine-Learning Models for Multi-Step Ahead Daily Streamflow Forecasting\",\"authors\":\"Ruonan Hao, Huaxiang Yan\",\"doi\":\"10.1002/hyp.70163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Streamflow forecasting using interpretable machine learning methods (MLs) for exploring runoff processes has received a lot of attention. However, exploring multi-step ahead daily streamflow forecasting considering antecedent streamflow as an input for various interpretable MLs is very limited. Thus, three interpretable MLs for daily streamflow forecasting in the Huaihe River basin of China during 2002–2020, including eXtreme Gradient Boosting (XGBoost), long short-term memory neural network (LSTM) and convolutional neural network (CNN) with SHapley Additive exPlanations (SHAP) method, were implemented to study the role of potential controlling factors, including antecedent streamflow, soil moisture and vegetation growth, in runoff processes at lead times of 0–6 days. The forecasting performances decreased with lead times. Specifically, the LSTM model performed best at lead times of 0–3 days, followed by CNN and XGBoost. CNN was superior to LSTM and XGBoost models when the lead time was greater than 3 days. The optimal forecasting performances were 0.71–0.97, 311.45–674.27 m<sup>3</sup>/s, 0.84–0.97 and 0.75–0.97 according to Nash-Sutclife efficiency, root-mean-square error, correlation coefficient and Kling-Gupta efficiency, respectively. The interpretable results varied across different MLs and at different lead times. The antecedent streamflow consistently dominated the runoff processes, particularly in the LSTM and XGBoost models. However, the significant role of soil moisture at the depth of 28–100 cm and leaf area index for low vegetation gradually emerged with increased lead times for CNN models, even outranking the importance of antecedent streamflow. Furthermore, the interpretability demonstrated by the optimal machine learning models was validated through the infiltration model and uncertainty analysis. Overall, interpretable machine learning has great potential to enhance our understanding of basin-scale runoff processes.</p>\\n </div>\",\"PeriodicalId\":13189,\"journal\":{\"name\":\"Hydrological Processes\",\"volume\":\"39 5\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Hydrological Processes\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/hyp.70163\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Environmental Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hydrological Processes","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/hyp.70163","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Environmental Science","Score":null,"Total":0}

引用次数: 0

摘要

利用可解释机器学习方法（MLs）进行径流预测已经受到了广泛的关注。然而，考虑到之前的流量作为各种可解释ml的输入，探索多步提前的每日流量预测是非常有限的。为此，利用极端梯度增强（XGBoost）、长短期记忆神经网络（LSTM）和卷积神经网络（CNN） 3种基于SHapley加性解释（SHAP）方法的淮河流域2002-2020年日流量预测可解释性模型，研究了前流、土壤湿度和植被生长等潜在控制因子在0 ~ 6 d径流过程中的作用。预测性能随着交货时间的缩短而下降。具体来说，LSTM模型在0-3天的交货期表现最好，其次是CNN和XGBoost。当交货期大于3天时，CNN优于LSTM和XGBoost模型。nash - sutlife效率、均方根误差、相关系数和Kling-Gupta效率分别为0.71 ~ 0.97、311.45 ~ 674.27 m3/s、0.84 ~ 0.97和0.75 ~ 0.97。可解释的结果在不同的ml和不同的前置时间有所不同。在LSTM和XGBoost模型中，径流过程始终由前流主导。然而，随着CNN模型预估时间的增加，28-100 cm深度土壤湿度和低植被叶面积指数的显著作用逐渐显现，甚至超过了前流的重要性。此外，通过渗透模型和不确定性分析验证了最优机器学习模型的可解释性。总的来说，可解释的机器学习在增强我们对流域尺度径流过程的理解方面具有巨大的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Interpreting Machine-Learning Models for Multi-Step Ahead Daily Streamflow Forecasting

Streamflow forecasting using interpretable machine learning methods (MLs) for exploring runoff processes has received a lot of attention. However, exploring multi-step ahead daily streamflow forecasting considering antecedent streamflow as an input for various interpretable MLs is very limited. Thus, three interpretable MLs for daily streamflow forecasting in the Huaihe River basin of China during 2002–2020, including eXtreme Gradient Boosting (XGBoost), long short-term memory neural network (LSTM) and convolutional neural network (CNN) with SHapley Additive exPlanations (SHAP) method, were implemented to study the role of potential controlling factors, including antecedent streamflow, soil moisture and vegetation growth, in runoff processes at lead times of 0–6 days. The forecasting performances decreased with lead times. Specifically, the LSTM model performed best at lead times of 0–3 days, followed by CNN and XGBoost. CNN was superior to LSTM and XGBoost models when the lead time was greater than 3 days. The optimal forecasting performances were 0.71–0.97, 311.45–674.27 m³/s, 0.84–0.97 and 0.75–0.97 according to Nash-Sutclife efficiency, root-mean-square error, correlation coefficient and Kling-Gupta efficiency, respectively. The interpretable results varied across different MLs and at different lead times. The antecedent streamflow consistently dominated the runoff processes, particularly in the LSTM and XGBoost models. However, the significant role of soil moisture at the depth of 28–100 cm and leaf area index for low vegetation gradually emerged with increased lead times for CNN models, even outranking the importance of antecedent streamflow. Furthermore, the interpretability demonstrated by the optimal machine learning models was validated through the infiltration model and uncertainty analysis. Overall, interpretable machine learning has great potential to enhance our understanding of basin-scale runoff processes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Hydrological Processes 环境科学-水资源

CiteScore

6.00

自引率

12.50%

发文量

313

审稿时长

2-4 weeks

期刊介绍： Hydrological Processes is an international journal that publishes original scientific papers advancing understanding of the mechanisms underlying the movement and storage of water in the environment, and the interaction of water with geological, biogeochemical, atmospheric and ecological systems. Not all papers related to water resources are appropriate for submission to this journal; rather we seek papers that clearly articulate the role(s) of hydrological processes.