Doaa El-Shahat, Ahmed Tolba, Mohamed Abouhawwash, Mohamed Abdel-Basset
{"title":"Machine learning and deep learning models based grid search cross validation for short-term solar irradiance forecasting","authors":"Doaa El-Shahat, Ahmed Tolba, Mohamed Abouhawwash, Mohamed Abdel-Basset","doi":"10.1186/s40537-024-00991-w","DOIUrl":null,"url":null,"abstract":"<p>In late 2023, the United Nations conference on climate change (COP28), which was held in Dubai, encouraged a quick move from fossil fuels to renewable energy. Solar energy is one of the most promising forms of energy that is both sustainable and renewable. Generally, photovoltaic systems transform solar irradiance into electricity. Unfortunately, instability and intermittency in solar radiation can lead to interruptions in electricity production. The accurate forecasting of solar irradiance guarantees sustainable power production even when solar irradiance is not present. Batteries can store solar energy to be used during periods of solar absence. Additionally, deterministic models take into account the specification of technical PV systems and may be not accurate for low solar irradiance. This paper presents a comparative study for the most common Deep Learning (DL) and Machine Learning (ML) algorithms employed for short-term solar irradiance forecasting. The dataset was gathered in Islamabad during a five-year period, from 2015 to 2019, at hourly intervals with accurate meteorological sensors. Furthermore, the Grid Search Cross Validation (GSCV) with five folds is introduced to ML and DL models for optimizing the hyperparameters of these models. Several performance metrics are used to assess the algorithms, such as the <i>Adjusted R</i><sup><i>2</i></sup><i> score</i>, <i>Normalized Root Mean Square Error</i> (NRMSE), <i>Mean Absolute Deviation</i> (MAD), <i>Mean Absolute Error</i> (MAE) and <i>Mean Square Error</i> (MSE). The statistical analysis shows that CNN-LSTM outperforms its counterparts of nine well-known DL models with <i>Adjusted R</i><sup><i>2</i></sup><i> score</i> value of 0.984. For ML algorithms, gradient boosting regression is an effective forecasting method with <i>Adjusted R</i><sup><i>2</i></sup><i> score</i> value of 0.962, beating its rivals of six ML models. Furthermore, SHAP and LIME are examples of explainable Artificial Intelligence (XAI) utilized for understanding the reasons behind the obtained results.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"13 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00991-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
In late 2023, the United Nations conference on climate change (COP28), which was held in Dubai, encouraged a quick move from fossil fuels to renewable energy. Solar energy is one of the most promising forms of energy that is both sustainable and renewable. Generally, photovoltaic systems transform solar irradiance into electricity. Unfortunately, instability and intermittency in solar radiation can lead to interruptions in electricity production. The accurate forecasting of solar irradiance guarantees sustainable power production even when solar irradiance is not present. Batteries can store solar energy to be used during periods of solar absence. Additionally, deterministic models take into account the specification of technical PV systems and may be not accurate for low solar irradiance. This paper presents a comparative study for the most common Deep Learning (DL) and Machine Learning (ML) algorithms employed for short-term solar irradiance forecasting. The dataset was gathered in Islamabad during a five-year period, from 2015 to 2019, at hourly intervals with accurate meteorological sensors. Furthermore, the Grid Search Cross Validation (GSCV) with five folds is introduced to ML and DL models for optimizing the hyperparameters of these models. Several performance metrics are used to assess the algorithms, such as the Adjusted R2 score, Normalized Root Mean Square Error (NRMSE), Mean Absolute Deviation (MAD), Mean Absolute Error (MAE) and Mean Square Error (MSE). The statistical analysis shows that CNN-LSTM outperforms its counterparts of nine well-known DL models with Adjusted R2 score value of 0.984. For ML algorithms, gradient boosting regression is an effective forecasting method with Adjusted R2 score value of 0.962, beating its rivals of six ML models. Furthermore, SHAP and LIME are examples of explainable Artificial Intelligence (XAI) utilized for understanding the reasons behind the obtained results.
期刊介绍:
The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.