{"title":"Machine learning approaches for imputing missing meteorological data in Senegal","authors":"Mory Toure , Nana Ama Browne Klutse , Mamadou Adama Sarr , Md Abul Ehsan Bhuiyan , Annine Duclaire Kenne , Wassila Mamadou Thiaw , Daouda Badiane , Amadou Thierno Gaye , Ousmane Ndiaye , Cheikh Mbow","doi":"10.1016/j.acags.2025.100281","DOIUrl":null,"url":null,"abstract":"<div><div>This study presents the first comprehensive evaluation in West Africa of four imputation methods, Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGB), and Ordinary Kriging (OK), applied to six core meteorological variables across Senegal over a ten-year period (2015–2024). By simulating realistic missing data scenarios informed by field conditions (e.g., power outages, observer absences, sensor failures), it establishes a robust benchmark for climate data reconstruction using machine learning in resource-constrained settings.</div><div>The findings highlight the clear superiority of ensemble learning approaches. XGB consistently outperformed all methods across variables and scenarios, achieving the highest average predictive accuracy with R<sup>2</sup> values up to [95 % CI: 0.82–0.88], along with lower Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). RF yielded comparable performance, especially for maximum and minimum temperature (TMAX, TMIN), maintaining strong stability even at 20 % missingness. In contrast, DT performance declined sharply with increased data loss, and OK was constrained by the sparse spatial distribution of meteorological stations, notably impairing its ability to impute precipitation (PRCP) and wind speed (WDSP).</div><div>This work contributes a multivariable imputation framework specifically adapted to West African climatic and infrastructural realities. It also integrates block bootstrap methods to quantify uncertainty and derive 95 % confidence intervals for all error metrics. Results confirm that imputation effectiveness is highly variable-dependent: continuous and temporally autocorrelated variables (TMAX, TMIN, dew point temperature — DEWP) are well reconstructed, whereas discontinuous or noisy variables (WDSP and PRCP) remain challenging.</div><div>Although ensemble models offer clear advantages, their computational demands and need for hyperparameter tuning may limit real-time implementation in low-resource national meteorological services. Furthermore, the exclusion of satellite or reanalysis inputs may constrain model generalizability.</div><div>Ultimately, this study reinforces the role of advanced machine learning methods in improving climate data completeness and reliability in Africa. Although not a substitute for direct observations, imputation emerges as a critical complementary tool to support robust and resilient climate information systems essential for agriculture, public health, and disaster risk management under intensifying climate variability.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"27 ","pages":"Article 100281"},"PeriodicalIF":3.2000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197425000631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
This study presents the first comprehensive evaluation in West Africa of four imputation methods, Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGB), and Ordinary Kriging (OK), applied to six core meteorological variables across Senegal over a ten-year period (2015–2024). By simulating realistic missing data scenarios informed by field conditions (e.g., power outages, observer absences, sensor failures), it establishes a robust benchmark for climate data reconstruction using machine learning in resource-constrained settings.
The findings highlight the clear superiority of ensemble learning approaches. XGB consistently outperformed all methods across variables and scenarios, achieving the highest average predictive accuracy with R2 values up to [95 % CI: 0.82–0.88], along with lower Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). RF yielded comparable performance, especially for maximum and minimum temperature (TMAX, TMIN), maintaining strong stability even at 20 % missingness. In contrast, DT performance declined sharply with increased data loss, and OK was constrained by the sparse spatial distribution of meteorological stations, notably impairing its ability to impute precipitation (PRCP) and wind speed (WDSP).
This work contributes a multivariable imputation framework specifically adapted to West African climatic and infrastructural realities. It also integrates block bootstrap methods to quantify uncertainty and derive 95 % confidence intervals for all error metrics. Results confirm that imputation effectiveness is highly variable-dependent: continuous and temporally autocorrelated variables (TMAX, TMIN, dew point temperature — DEWP) are well reconstructed, whereas discontinuous or noisy variables (WDSP and PRCP) remain challenging.
Although ensemble models offer clear advantages, their computational demands and need for hyperparameter tuning may limit real-time implementation in low-resource national meteorological services. Furthermore, the exclusion of satellite or reanalysis inputs may constrain model generalizability.
Ultimately, this study reinforces the role of advanced machine learning methods in improving climate data completeness and reliability in Africa. Although not a substitute for direct observations, imputation emerges as a critical complementary tool to support robust and resilient climate information systems essential for agriculture, public health, and disaster risk management under intensifying climate variability.