Performance analysis of neural network architectures for time series forecasting: A comparative study of RNN, LSTM, GRU, and hybrid models

IF 1.9 Q2 MULTIDISCIPLINARY SCIENCES

MethodsX Pub Date : 2025-07-08 DOI:10.1016/j.mex.2025.103462

Ariana Yunita , MHD Iqbal Pratama , Muhammad Zaki Almuzakki , Hani Ramadhan , Emelia Akashah P. Akhir , Andi Besse Firdausiah Mansur , Ahmad Hoirul Basori

{"title":"Performance analysis of neural network architectures for time series forecasting: A comparative study of RNN, LSTM, GRU, and hybrid models","authors":"Ariana Yunita , MHD Iqbal Pratama , Muhammad Zaki Almuzakki , Hani Ramadhan , Emelia Akashah P. Akhir , Andi Besse Firdausiah Mansur , Ahmad Hoirul Basori","doi":"10.1016/j.mex.2025.103462","DOIUrl":null,"url":null,"abstract":"<div><div>Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs) have gained significant popularity in time series forecasting across diverse domains including healthcare, astronomy, and engineering. However, the inherent variability in model performance due to random weight initialization raises questions about the reliability and consistency of these architectures for time series analysis. This study addresses this concern by conducting a comprehensive benchmark evaluation of nine neural network architectures: vanilla RNN, LSTM, GRU, and six hybrid configurations (RNN-LSTM, RNN-GRU, LSTM-RNN, GRU-RNN, LSTM-GRU, and GRU-LSTM). Performance evaluation was conducted using Monte Carlo simulation with 100 iterations across three real-world datasets: sunspot activity, Indonesian COVID-19 cases, and dissolved oxygen concentration measurements. Statistical analysis employed the Friedman test to assess performance differences across architectures. Results showed no statistically significant differences among the nine architectures. Despite the lack of statistical significance, consistent performance patterns emerged favoring LSTM-based hybrid architectures. The LSTM-GRU and LSTM-RNN configurations demonstrated superior performance across multiple evaluation metrics, with LSTM-RNN excelling in sunspot and dissolved oxygen forecasting, while standalone LSTM showed optimal performance for COVID-19 prediction. These findings provide evidence-based guidance for architecture selection in time series forecasting applications, suggesting that while statistical equivalence exists among architectures, LSTM-based hybrids offer practical advantages in terms of consistency and robustness across diverse temporal patterns.</div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103462"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125003073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs) have gained significant popularity in time series forecasting across diverse domains including healthcare, astronomy, and engineering. However, the inherent variability in model performance due to random weight initialization raises questions about the reliability and consistency of these architectures for time series analysis. This study addresses this concern by conducting a comprehensive benchmark evaluation of nine neural network architectures: vanilla RNN, LSTM, GRU, and six hybrid configurations (RNN-LSTM, RNN-GRU, LSTM-RNN, GRU-RNN, LSTM-GRU, and GRU-LSTM). Performance evaluation was conducted using Monte Carlo simulation with 100 iterations across three real-world datasets: sunspot activity, Indonesian COVID-19 cases, and dissolved oxygen concentration measurements. Statistical analysis employed the Friedman test to assess performance differences across architectures. Results showed no statistically significant differences among the nine architectures. Despite the lack of statistical significance, consistent performance patterns emerged favoring LSTM-based hybrid architectures. The LSTM-GRU and LSTM-RNN configurations demonstrated superior performance across multiple evaluation metrics, with LSTM-RNN excelling in sunspot and dissolved oxygen forecasting, while standalone LSTM showed optimal performance for COVID-19 prediction. These findings provide evidence-based guidance for architecture selection in time series forecasting applications, suggesting that while statistical equivalence exists among architectures, LSTM-based hybrids offer practical advantages in terms of consistency and robustness across diverse temporal patterns.

Abstract Image

查看原文本刊更多论文

时间序列预测的神经网络结构性能分析：RNN、LSTM、GRU和混合模型的比较研究

递归神经网络（rnn）、长短期记忆（LSTM）网络和门控递归单元（gru）在包括医疗保健、天文学和工程在内的各个领域的时间序列预测中获得了显著的普及。然而，由于随机权值初始化导致的模型性能的固有可变性引发了对这些时间序列分析体系结构的可靠性和一致性的质疑。本研究通过对九种神经网络架构进行全面的基准评估来解决这一问题：普通RNN、LSTM、GRU和六种混合配置（RNN-LSTM、RNN-GRU、LSTM-RNN、GRU-RNN、LSTM-GRU和GRU-LSTM）。性能评估使用蒙特卡罗模拟，在三个真实世界数据集（太阳黑子活动、印度尼西亚COVID-19病例和溶解氧浓度测量）上进行了100次迭代。统计分析采用Friedman测试来评估不同架构之间的性能差异。结果显示，9种架构之间无统计学差异。尽管缺乏统计意义，但一致的性能模式有利于基于lstm的混合架构。LSTM- gru和LSTM- rnn配置在多个评估指标上表现优异，其中LSTM- rnn在太阳黑子和溶解氧预测方面表现出色，而独立LSTM在2019冠状病毒病预测方面表现最佳。这些发现为时间序列预测应用中的体系结构选择提供了基于证据的指导，表明虽然体系结构之间存在统计等效性，但基于lstm的混合模型在跨不同时间模式的一致性和鲁棒性方面提供了实际优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊