{"title":"解释多步前每日流量预测的机器学习模型","authors":"Ruonan Hao, Huaxiang Yan","doi":"10.1002/hyp.70163","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Streamflow forecasting using interpretable machine learning methods (MLs) for exploring runoff processes has received a lot of attention. However, exploring multi-step ahead daily streamflow forecasting considering antecedent streamflow as an input for various interpretable MLs is very limited. Thus, three interpretable MLs for daily streamflow forecasting in the Huaihe River basin of China during 2002–2020, including eXtreme Gradient Boosting (XGBoost), long short-term memory neural network (LSTM) and convolutional neural network (CNN) with SHapley Additive exPlanations (SHAP) method, were implemented to study the role of potential controlling factors, including antecedent streamflow, soil moisture and vegetation growth, in runoff processes at lead times of 0–6 days. The forecasting performances decreased with lead times. Specifically, the LSTM model performed best at lead times of 0–3 days, followed by CNN and XGBoost. CNN was superior to LSTM and XGBoost models when the lead time was greater than 3 days. The optimal forecasting performances were 0.71–0.97, 311.45–674.27 m<sup>3</sup>/s, 0.84–0.97 and 0.75–0.97 according to Nash-Sutclife efficiency, root-mean-square error, correlation coefficient and Kling-Gupta efficiency, respectively. The interpretable results varied across different MLs and at different lead times. The antecedent streamflow consistently dominated the runoff processes, particularly in the LSTM and XGBoost models. However, the significant role of soil moisture at the depth of 28–100 cm and leaf area index for low vegetation gradually emerged with increased lead times for CNN models, even outranking the importance of antecedent streamflow. Furthermore, the interpretability demonstrated by the optimal machine learning models was validated through the infiltration model and uncertainty analysis. Overall, interpretable machine learning has great potential to enhance our understanding of basin-scale runoff processes.</p>\n </div>","PeriodicalId":13189,"journal":{"name":"Hydrological Processes","volume":"39 5","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Interpreting Machine-Learning Models for Multi-Step Ahead Daily Streamflow Forecasting\",\"authors\":\"Ruonan Hao, Huaxiang Yan\",\"doi\":\"10.1002/hyp.70163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Streamflow forecasting using interpretable machine learning methods (MLs) for exploring runoff processes has received a lot of attention. However, exploring multi-step ahead daily streamflow forecasting considering antecedent streamflow as an input for various interpretable MLs is very limited. Thus, three interpretable MLs for daily streamflow forecasting in the Huaihe River basin of China during 2002–2020, including eXtreme Gradient Boosting (XGBoost), long short-term memory neural network (LSTM) and convolutional neural network (CNN) with SHapley Additive exPlanations (SHAP) method, were implemented to study the role of potential controlling factors, including antecedent streamflow, soil moisture and vegetation growth, in runoff processes at lead times of 0–6 days. The forecasting performances decreased with lead times. Specifically, the LSTM model performed best at lead times of 0–3 days, followed by CNN and XGBoost. CNN was superior to LSTM and XGBoost models when the lead time was greater than 3 days. The optimal forecasting performances were 0.71–0.97, 311.45–674.27 m<sup>3</sup>/s, 0.84–0.97 and 0.75–0.97 according to Nash-Sutclife efficiency, root-mean-square error, correlation coefficient and Kling-Gupta efficiency, respectively. The interpretable results varied across different MLs and at different lead times. The antecedent streamflow consistently dominated the runoff processes, particularly in the LSTM and XGBoost models. However, the significant role of soil moisture at the depth of 28–100 cm and leaf area index for low vegetation gradually emerged with increased lead times for CNN models, even outranking the importance of antecedent streamflow. Furthermore, the interpretability demonstrated by the optimal machine learning models was validated through the infiltration model and uncertainty analysis. Overall, interpretable machine learning has great potential to enhance our understanding of basin-scale runoff processes.</p>\\n </div>\",\"PeriodicalId\":13189,\"journal\":{\"name\":\"Hydrological Processes\",\"volume\":\"39 5\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Hydrological Processes\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/hyp.70163\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Environmental Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hydrological Processes","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/hyp.70163","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Environmental Science","Score":null,"Total":0}
Towards Interpreting Machine-Learning Models for Multi-Step Ahead Daily Streamflow Forecasting
Streamflow forecasting using interpretable machine learning methods (MLs) for exploring runoff processes has received a lot of attention. However, exploring multi-step ahead daily streamflow forecasting considering antecedent streamflow as an input for various interpretable MLs is very limited. Thus, three interpretable MLs for daily streamflow forecasting in the Huaihe River basin of China during 2002–2020, including eXtreme Gradient Boosting (XGBoost), long short-term memory neural network (LSTM) and convolutional neural network (CNN) with SHapley Additive exPlanations (SHAP) method, were implemented to study the role of potential controlling factors, including antecedent streamflow, soil moisture and vegetation growth, in runoff processes at lead times of 0–6 days. The forecasting performances decreased with lead times. Specifically, the LSTM model performed best at lead times of 0–3 days, followed by CNN and XGBoost. CNN was superior to LSTM and XGBoost models when the lead time was greater than 3 days. The optimal forecasting performances were 0.71–0.97, 311.45–674.27 m3/s, 0.84–0.97 and 0.75–0.97 according to Nash-Sutclife efficiency, root-mean-square error, correlation coefficient and Kling-Gupta efficiency, respectively. The interpretable results varied across different MLs and at different lead times. The antecedent streamflow consistently dominated the runoff processes, particularly in the LSTM and XGBoost models. However, the significant role of soil moisture at the depth of 28–100 cm and leaf area index for low vegetation gradually emerged with increased lead times for CNN models, even outranking the importance of antecedent streamflow. Furthermore, the interpretability demonstrated by the optimal machine learning models was validated through the infiltration model and uncertainty analysis. Overall, interpretable machine learning has great potential to enhance our understanding of basin-scale runoff processes.
期刊介绍:
Hydrological Processes is an international journal that publishes original scientific papers advancing understanding of the mechanisms underlying the movement and storage of water in the environment, and the interaction of water with geological, biogeochemical, atmospheric and ecological systems. Not all papers related to water resources are appropriate for submission to this journal; rather we seek papers that clearly articulate the role(s) of hydrological processes.