Data collection strategy for building rainfall-runoff LSTM model predicting daily runoff

Journal of Korea Water Resources Association Pub Date : 2021-10-01 DOI:10.3741/JKWRA.2021.54.10.795

Dongkyun Kim

{"title":"Data collection strategy for building rainfall-runoff LSTM model predicting daily runoff","authors":"Dongkyun Kim","doi":"10.3741/JKWRA.2021.54.10.795","DOIUrl":null,"url":null,"abstract":"In this study, after developing an LSTM-based deep learning model for estimating daily runoff in the Soyang River Dam basin, the accuracy of the model for various combinations of model structure and input data was investigated. A model was built based on the database consisting of average daily precipitation, average daily temperature, average daily wind speed (input up to here), and daily average flow rate (output) during the first 12 years (1997.1.1-2008.12.31). The Nash-Sutcliffe Model Efficiency Coefficient (NSE) and RMSE were examined for validation using the flow discharge data of the later 12 years (2009.1.1-2020.12.31). The combination that showed the highest accuracy was the case in which all possible input data (12 years of daily precipitation, weather temperature, wind speed) were used on the LSTM model structure with 64 hidden units. The NSE and RMSE of the verification period were 0.862 and 76.8 m3/s, respectively. When the number of hidden units of LSTM exceeds 500, the performance degradation of the model due to overfitting begins to appear, and when the number of hidden units exceeds 1000, the overfitting problem becomes prominent. A model with very high performance (NSE=0.8~0.84) could be obtained when only 12 years of daily precipitation was used for model training. A model with reasonably high performance (NSE=0.63-0.85) when only one year of input data was used for model training. In particular, an accurate model (NSE=0.85) could be obtained if the one year of training data contains a wide magnitude of flow events such as extreme flow and droughts as well as normal events. If the training data includes both the normal and extreme flow rates, input data that is longer than 5 years did not significantly improve the model performance.","PeriodicalId":224359,"journal":{"name":"Journal of Korea Water Resources Association","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Korea Water Resources Association","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3741/JKWRA.2021.54.10.795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In this study, after developing an LSTM-based deep learning model for estimating daily runoff in the Soyang River Dam basin, the accuracy of the model for various combinations of model structure and input data was investigated. A model was built based on the database consisting of average daily precipitation, average daily temperature, average daily wind speed (input up to here), and daily average flow rate (output) during the first 12 years (1997.1.1-2008.12.31). The Nash-Sutcliffe Model Efficiency Coefficient (NSE) and RMSE were examined for validation using the flow discharge data of the later 12 years (2009.1.1-2020.12.31). The combination that showed the highest accuracy was the case in which all possible input data (12 years of daily precipitation, weather temperature, wind speed) were used on the LSTM model structure with 64 hidden units. The NSE and RMSE of the verification period were 0.862 and 76.8 m3/s, respectively. When the number of hidden units of LSTM exceeds 500, the performance degradation of the model due to overfitting begins to appear, and when the number of hidden units exceeds 1000, the overfitting problem becomes prominent. A model with very high performance (NSE=0.8~0.84) could be obtained when only 12 years of daily precipitation was used for model training. A model with reasonably high performance (NSE=0.63-0.85) when only one year of input data was used for model training. In particular, an accurate model (NSE=0.85) could be obtained if the one year of training data contains a wide magnitude of flow events such as extreme flow and droughts as well as normal events. If the training data includes both the normal and extreme flow rates, input data that is longer than 5 years did not significantly improve the model performance.

查看原文本刊更多论文

建立降雨-径流LSTM模型预测日径流的数据收集策略

在本研究中，在开发了基于lstm的深度学习模型用于估算索阳河流域日径流量后，研究了模型结构和输入数据不同组合下模型的准确性。利用前12年(1997.1.1-2008.12.31)的日平均降水量、日平均气温、日平均风速(输入到这里)和日平均流量(输出)组成的数据库建立模型。利用后期12年(2009.1.1-2020.12.31)的流量数据，对Nash-Sutcliffe模型效率系数(NSE)和RMSE进行验证。在具有64个隐藏单元的LSTM模型结构上使用所有可能的输入数据(12年的日降水量、天气温度、风速)的组合显示出最高的准确性。验证期NSE和RMSE分别为0.862和76.8 m3/s。当LSTM的隐藏单元数量超过500时，开始出现模型因过拟合而导致的性能下降，当隐藏单元数量超过1000时，过拟合问题变得突出。当只使用12年的日降水量进行模型训练时，可以得到一个性能非常高的模型(NSE=0.8~0.84)。当只使用一年的输入数据进行模型训练时，具有相当高的性能(NSE=0.63-0.85)的模型。特别是，如果一年的训练数据包含大范围的流量事件，如极端流量和干旱以及正常事件，则可以获得准确的模型(NSE=0.85)。如果训练数据同时包含正常流量和极端流量，则输入时间超过5年的数据并没有显著提高模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Korea Water Resources Association

CiteScore

0.80

自引率

0.00%

发文量