Ariana Yunita , MHD Iqbal Pratama , Muhammad Zaki Almuzakki , Hani Ramadhan , Emelia Akashah P. Akhir , Andi Besse Firdausiah Mansur , Ahmad Hoirul Basori
{"title":"Performance analysis of neural network architectures for time series forecasting: A comparative study of RNN, LSTM, GRU, and hybrid models","authors":"Ariana Yunita , MHD Iqbal Pratama , Muhammad Zaki Almuzakki , Hani Ramadhan , Emelia Akashah P. Akhir , Andi Besse Firdausiah Mansur , Ahmad Hoirul Basori","doi":"10.1016/j.mex.2025.103462","DOIUrl":null,"url":null,"abstract":"<div><div>Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs) have gained significant popularity in time series forecasting across diverse domains including healthcare, astronomy, and engineering. However, the inherent variability in model performance due to random weight initialization raises questions about the reliability and consistency of these architectures for time series analysis. This study addresses this concern by conducting a comprehensive benchmark evaluation of nine neural network architectures: vanilla RNN, LSTM, GRU, and six hybrid configurations (RNN-LSTM, RNN-GRU, LSTM-RNN, GRU-RNN, LSTM-GRU, and GRU-LSTM). Performance evaluation was conducted using Monte Carlo simulation with 100 iterations across three real-world datasets: sunspot activity, Indonesian COVID-19 cases, and dissolved oxygen concentration measurements. Statistical analysis employed the Friedman test to assess performance differences across architectures. Results showed no statistically significant differences among the nine architectures. Despite the lack of statistical significance, consistent performance patterns emerged favoring LSTM-based hybrid architectures. The LSTM-GRU and LSTM-RNN configurations demonstrated superior performance across multiple evaluation metrics, with LSTM-RNN excelling in sunspot and dissolved oxygen forecasting, while standalone LSTM showed optimal performance for COVID-19 prediction. These findings provide evidence-based guidance for architecture selection in time series forecasting applications, suggesting that while statistical equivalence exists among architectures, LSTM-based hybrids offer practical advantages in terms of consistency and robustness across diverse temporal patterns.</div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103462"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125003073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs) have gained significant popularity in time series forecasting across diverse domains including healthcare, astronomy, and engineering. However, the inherent variability in model performance due to random weight initialization raises questions about the reliability and consistency of these architectures for time series analysis. This study addresses this concern by conducting a comprehensive benchmark evaluation of nine neural network architectures: vanilla RNN, LSTM, GRU, and six hybrid configurations (RNN-LSTM, RNN-GRU, LSTM-RNN, GRU-RNN, LSTM-GRU, and GRU-LSTM). Performance evaluation was conducted using Monte Carlo simulation with 100 iterations across three real-world datasets: sunspot activity, Indonesian COVID-19 cases, and dissolved oxygen concentration measurements. Statistical analysis employed the Friedman test to assess performance differences across architectures. Results showed no statistically significant differences among the nine architectures. Despite the lack of statistical significance, consistent performance patterns emerged favoring LSTM-based hybrid architectures. The LSTM-GRU and LSTM-RNN configurations demonstrated superior performance across multiple evaluation metrics, with LSTM-RNN excelling in sunspot and dissolved oxygen forecasting, while standalone LSTM showed optimal performance for COVID-19 prediction. These findings provide evidence-based guidance for architecture selection in time series forecasting applications, suggesting that while statistical equivalence exists among architectures, LSTM-based hybrids offer practical advantages in terms of consistency and robustness across diverse temporal patterns.