缺失值数据集中长短时记忆算法的性能评估

Hyun-Geoun Park, S. Suh, G. Jo, Jinuk Jang, S. Ki
{"title":"缺失值数据集中长短时记忆算法的性能评估","authors":"Hyun-Geoun Park, S. Suh, G. Jo, Jinuk Jang, S. Ki","doi":"10.4491/ksee.2022.44.12.636","DOIUrl":null,"url":null,"abstract":"This study was conducted to assess the performance of a long short-term memory algorithm (LSTM), which was suitable for time series prediction, in the multivariate dataset with missing values. The full dataset for the adopted LSTM model was prepared by running a popular watershed model Hydrological Simulation Program-Fortran (HSPF) in the upper Nam River Basin for 3 years from 2016 to 2018, excluding a one-year warm-up period, on a daily time step. The accuracy of prediction for the LSTM model was evaluated in response to various interpolation methods as well as changes in the number of missing values (for dependent variables) and independent variables (containing a fixed number of missing values for either single or multiple variables). Note that the entire dataset is divided into training and test datasets at a ratio of 7:3. Results showed that different interpolation methods resulted in a considerable variation in performance of the LSTM model. Out of them, StructTS and RPART were selected as the best imputation methods recovering missing values for discharge and total phosphorus, respectively. The prediction error of the LSTM model increased gradually with increasing the number of missing values from 300 to 700. The LSTM model, however, appeared to maintain its performance fairly well even in data sets with a large amount of missing values as long as adequate interpolation methods were adopted for each dependent variable. The performance of the LSTM model degraded further as the number of independent variables containing the fixed number of missing values increased from 1 to 7. We believe that the proposed methodology can be used not only to reconstruct missing values in a real-time monitoring dataset with excellent performance, but also to improve the accuracy of prediction for (time series) deep learning models.","PeriodicalId":52756,"journal":{"name":"daehanhwangyeonggonghaghoeji","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values\",\"authors\":\"Hyun-Geoun Park, S. Suh, G. Jo, Jinuk Jang, S. Ki\",\"doi\":\"10.4491/ksee.2022.44.12.636\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study was conducted to assess the performance of a long short-term memory algorithm (LSTM), which was suitable for time series prediction, in the multivariate dataset with missing values. The full dataset for the adopted LSTM model was prepared by running a popular watershed model Hydrological Simulation Program-Fortran (HSPF) in the upper Nam River Basin for 3 years from 2016 to 2018, excluding a one-year warm-up period, on a daily time step. The accuracy of prediction for the LSTM model was evaluated in response to various interpolation methods as well as changes in the number of missing values (for dependent variables) and independent variables (containing a fixed number of missing values for either single or multiple variables). Note that the entire dataset is divided into training and test datasets at a ratio of 7:3. Results showed that different interpolation methods resulted in a considerable variation in performance of the LSTM model. Out of them, StructTS and RPART were selected as the best imputation methods recovering missing values for discharge and total phosphorus, respectively. The prediction error of the LSTM model increased gradually with increasing the number of missing values from 300 to 700. The LSTM model, however, appeared to maintain its performance fairly well even in data sets with a large amount of missing values as long as adequate interpolation methods were adopted for each dependent variable. The performance of the LSTM model degraded further as the number of independent variables containing the fixed number of missing values increased from 1 to 7. We believe that the proposed methodology can be used not only to reconstruct missing values in a real-time monitoring dataset with excellent performance, but also to improve the accuracy of prediction for (time series) deep learning models.\",\"PeriodicalId\":52756,\"journal\":{\"name\":\"daehanhwangyeonggonghaghoeji\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"daehanhwangyeonggonghaghoeji\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4491/ksee.2022.44.12.636\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"daehanhwangyeonggonghaghoeji","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4491/ksee.2022.44.12.636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究旨在评估适用于时间序列预测的长短期记忆算法(LSTM)在具有缺失值的多变量数据集中的性能。所采用的LSTM模型的完整数据集是通过在Nam河上游流域运行流行的流域模型水文模拟程序Fortran(HSPF)从2016年到2018年的3年来准备的,不包括一年的预热期,以每天的时间步长。LSTM模型的预测准确性是根据各种插值方法以及缺失值(因变量)和自变量(包含单个或多个变量的固定数量的缺失值)数量的变化进行评估的。请注意,整个数据集以7:3的比例分为训练数据集和测试数据集。结果表明,不同的插值方法会导致LSTM模型的性能发生相当大的变化。其中,StructTS和RPART分别被选为恢复出院和总磷缺失值的最佳插补方法。LSTM模型的预测误差随着缺失值数量从300增加到700而逐渐增加。然而,LSTM模型似乎可以很好地保持其性能,即使在具有大量缺失值的数据集中,只要对每个因变量采用适当的插值方法。随着包含固定数量缺失值的自变量数量从1增加到7,LSTM模型的性能进一步下降。我们相信,所提出的方法不仅可以用于重建性能优异的实时监测数据集中的缺失值,还可以提高(时间序列)深度学习模型的预测精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values
This study was conducted to assess the performance of a long short-term memory algorithm (LSTM), which was suitable for time series prediction, in the multivariate dataset with missing values. The full dataset for the adopted LSTM model was prepared by running a popular watershed model Hydrological Simulation Program-Fortran (HSPF) in the upper Nam River Basin for 3 years from 2016 to 2018, excluding a one-year warm-up period, on a daily time step. The accuracy of prediction for the LSTM model was evaluated in response to various interpolation methods as well as changes in the number of missing values (for dependent variables) and independent variables (containing a fixed number of missing values for either single or multiple variables). Note that the entire dataset is divided into training and test datasets at a ratio of 7:3. Results showed that different interpolation methods resulted in a considerable variation in performance of the LSTM model. Out of them, StructTS and RPART were selected as the best imputation methods recovering missing values for discharge and total phosphorus, respectively. The prediction error of the LSTM model increased gradually with increasing the number of missing values from 300 to 700. The LSTM model, however, appeared to maintain its performance fairly well even in data sets with a large amount of missing values as long as adequate interpolation methods were adopted for each dependent variable. The performance of the LSTM model degraded further as the number of independent variables containing the fixed number of missing values increased from 1 to 7. We believe that the proposed methodology can be used not only to reconstruct missing values in a real-time monitoring dataset with excellent performance, but also to improve the accuracy of prediction for (time series) deep learning models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
38
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信