Michaela Schütz, Adrian Schütz, Jörg Bendix, Jonas Müller und Boris Thies
{"title":"Evaluating station, satellite, & combined data for XGBoost-based visibility forecast","authors":"Michaela Schütz, Adrian Schütz, Jörg Bendix, Jonas Müller und Boris Thies","doi":"10.1016/j.atmosres.2025.108395","DOIUrl":null,"url":null,"abstract":"<div><div>Radiation fog poses challenges for the very short-term weather forecasting due to its complex atmospheric dynamics. Accurate and spatially available visibility predictions are crucial for sectors where visibility conditions directly impact safety and operational efficiency. Traditional numerical weather prediction models lack real-time forecasting capabilities, so this study investigates a machine-learning-based visibility forecast with XGBoost for a station in Germany. Two data sources were used and compared: high-resolution station data and nationwide available Meteosat Second Generation (MSG) satellite data. The analysis investigates how the coarser spatial resolution of MSG data compares to finer station data in predicting fog formation and dissipation. Therefore, station-based predictors were substituted with MSG satellite data. Additionally, the study addresses data imbalances during training and evaluation by focusing on critical low-visibility conditions and specifically fog formation and dissipation. XGBoost significantly outperforms the three baseline models – pure visibility driven forecast, Persistence Model and Linear Regression. The mean absolute error (MAE) is less than 150 m in the low visibility range. For the predominantly MSG-variable-based model only 3 % of fog formations and 6 % of fog dissipations are completely missed. Furthermore, the MSG-model predicts 50 % of fog formations and 60 % of dissipations within the 30-min window of their actual occurrence. The model utilizing MSG data as substitutes for station-based predictors delivers comparable performance to the purely station-data-based forecast highlighting the potential of area-wide accessible MSG data. However, visibility measurements remain necessary for forecasting. Therefore, future research should develop satellite-derived products to replace visibility, enabling fully spatial forecasts.</div></div>","PeriodicalId":8600,"journal":{"name":"Atmospheric Research","volume":"328 ","pages":"Article 108395"},"PeriodicalIF":4.4000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Research","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169809525004879","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Radiation fog poses challenges for the very short-term weather forecasting due to its complex atmospheric dynamics. Accurate and spatially available visibility predictions are crucial for sectors where visibility conditions directly impact safety and operational efficiency. Traditional numerical weather prediction models lack real-time forecasting capabilities, so this study investigates a machine-learning-based visibility forecast with XGBoost for a station in Germany. Two data sources were used and compared: high-resolution station data and nationwide available Meteosat Second Generation (MSG) satellite data. The analysis investigates how the coarser spatial resolution of MSG data compares to finer station data in predicting fog formation and dissipation. Therefore, station-based predictors were substituted with MSG satellite data. Additionally, the study addresses data imbalances during training and evaluation by focusing on critical low-visibility conditions and specifically fog formation and dissipation. XGBoost significantly outperforms the three baseline models – pure visibility driven forecast, Persistence Model and Linear Regression. The mean absolute error (MAE) is less than 150 m in the low visibility range. For the predominantly MSG-variable-based model only 3 % of fog formations and 6 % of fog dissipations are completely missed. Furthermore, the MSG-model predicts 50 % of fog formations and 60 % of dissipations within the 30-min window of their actual occurrence. The model utilizing MSG data as substitutes for station-based predictors delivers comparable performance to the purely station-data-based forecast highlighting the potential of area-wide accessible MSG data. However, visibility measurements remain necessary for forecasting. Therefore, future research should develop satellite-derived products to replace visibility, enabling fully spatial forecasts.
期刊介绍:
The journal publishes scientific papers (research papers, review articles, letters and notes) dealing with the part of the atmosphere where meteorological events occur. Attention is given to all processes extending from the earth surface to the tropopause, but special emphasis continues to be devoted to the physics of clouds, mesoscale meteorology and air pollution, i.e. atmospheric aerosols; microphysical processes; cloud dynamics and thermodynamics; numerical simulation, climatology, climate change and weather modification.