Augmented robustness in home demand prediction: Integrating statistical loss function with enhanced cross-validation in machine learning hyperparameter optimisation
IF 9.6 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
{"title":"Augmented robustness in home demand prediction: Integrating statistical loss function with enhanced cross-validation in machine learning hyperparameter optimisation","authors":"Banafshe Parizad, Ali Jamali, Hamid Khayyam","doi":"10.1016/j.egyai.2025.100584","DOIUrl":null,"url":null,"abstract":"<div><div>Sustainable forecasting of home energy demand (SFHED) is crucial for promoting energy efficiency, minimizing environmental impact, and optimizing resource allocation. Machine learning (ML) supports SFHED by identifying patterns and forecasting demand. However, conventional hyperparameter tuning methods often rely solely on minimizing average prediction errors, typically through fixed k-fold cross-validation, which overlooks error variability and limits model robustness. To address this limitation, we propose the Optimized Robust Hyperparameter Tuning for Machine Learning with Enhanced Multi-fold Cross-Validation (ORHT-ML-EMCV) framework. This method integrates statistical analysis of k-fold validation errors by incorporating their mean and variance into the optimization objective, enhancing robustness and generalizability. A weighting factor is introduced to balance accuracy and robustness, and its impact is evaluated across a range of values. A novel Enhanced Multi-Fold Cross-Validation (EMCV) technique is employed to automatically evaluate model performance across varying fold configurations without requiring a predefined k value, thereby reducing sensitivity to data splits. Using three evolutionary algorithms Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Differential Evolution (DE) we optimize two ensemble models: XGBoost and LightGBM. The optimization process minimizes both mean error and variance, with robustness assessed through cumulative distribution function (CDF) analyses. Experiments on three real-world residential datasets show the proposed method reduces worst-case Root Mean Square Error (RMSE) by up to 19.8% and narrows confidence intervals by up to 25%. Cross-household validations confirm strong generalization, achieving coefficient of determination (R²) of 0.946 and 0.972 on unseen homes. The framework offers a statistically grounded and efficient solution for robust energy forecasting.</div></div>","PeriodicalId":34138,"journal":{"name":"Energy and AI","volume":"21 ","pages":"Article 100584"},"PeriodicalIF":9.6000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy and AI","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666546825001168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Sustainable forecasting of home energy demand (SFHED) is crucial for promoting energy efficiency, minimizing environmental impact, and optimizing resource allocation. Machine learning (ML) supports SFHED by identifying patterns and forecasting demand. However, conventional hyperparameter tuning methods often rely solely on minimizing average prediction errors, typically through fixed k-fold cross-validation, which overlooks error variability and limits model robustness. To address this limitation, we propose the Optimized Robust Hyperparameter Tuning for Machine Learning with Enhanced Multi-fold Cross-Validation (ORHT-ML-EMCV) framework. This method integrates statistical analysis of k-fold validation errors by incorporating their mean and variance into the optimization objective, enhancing robustness and generalizability. A weighting factor is introduced to balance accuracy and robustness, and its impact is evaluated across a range of values. A novel Enhanced Multi-Fold Cross-Validation (EMCV) technique is employed to automatically evaluate model performance across varying fold configurations without requiring a predefined k value, thereby reducing sensitivity to data splits. Using three evolutionary algorithms Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Differential Evolution (DE) we optimize two ensemble models: XGBoost and LightGBM. The optimization process minimizes both mean error and variance, with robustness assessed through cumulative distribution function (CDF) analyses. Experiments on three real-world residential datasets show the proposed method reduces worst-case Root Mean Square Error (RMSE) by up to 19.8% and narrows confidence intervals by up to 25%. Cross-household validations confirm strong generalization, achieving coefficient of determination (R²) of 0.946 and 0.972 on unseen homes. The framework offers a statistically grounded and efficient solution for robust energy forecasting.