{"title":"Performance of decision-tree-based ensemble classifiers in predicting fog frequency in ungauged areas","authors":"Daeha Kim, Eunhee Kim, Eunji Kim","doi":"10.1175/waf-d-23-0024.1","DOIUrl":null,"url":null,"abstract":"Abstract Fog is a phenomenon that exerts significant impacts on transportation, aviation, air quality, agriculture, and even water resources. While data-driven machine learning algorithms have shown promising performance in capturing non-linear fog events at point locations, their applicability to different areas and time periods is questionable. This study addresses this issue by examining five decision-tree-based classifiers in a South Korean region, where diverse fog formation mechanisms are at play. The five machine learning algorithms were trained at point locations, and tested with other point locations for time periods independent of the training processes. Using the ensemble classifiers and high-resolution atmospheric reanalysis data, we also attempted to establish fog occurrence maps in a regional area. Results showed that machine learning models trained on the local datasets exhibited superior performance in mountainous areas, where radiative cooling predominantly contributes to fog formation, compared to inland and coastal regions. As the fog generation mechanisms diversified, the tree-based ensemble models appeared to encounter challenges in delineating their decision boundaries. When they were trained with the reanalysis data, their predictive skills were significantly decreased, resulting in high false alarm rates. This prompted the need for post-processing techniques to rectify overestimated fog frequency. While post-processing may ameliorate overestimation, caution is needed to interpret the resultant fog frequency estimates, especially in regions with more diverse fog generation mechanisms. The spatial upscaling of machine-learning-based fog prediction models poses challenges owing to the intricate interplay of various fog formation mechanisms, data imbalances, and potential inaccuracies in reanalysis data.","PeriodicalId":49369,"journal":{"name":"Weather and Forecasting","volume":"46 1","pages":"0"},"PeriodicalIF":3.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Weather and Forecasting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1175/waf-d-23-0024.1","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Fog is a phenomenon that exerts significant impacts on transportation, aviation, air quality, agriculture, and even water resources. While data-driven machine learning algorithms have shown promising performance in capturing non-linear fog events at point locations, their applicability to different areas and time periods is questionable. This study addresses this issue by examining five decision-tree-based classifiers in a South Korean region, where diverse fog formation mechanisms are at play. The five machine learning algorithms were trained at point locations, and tested with other point locations for time periods independent of the training processes. Using the ensemble classifiers and high-resolution atmospheric reanalysis data, we also attempted to establish fog occurrence maps in a regional area. Results showed that machine learning models trained on the local datasets exhibited superior performance in mountainous areas, where radiative cooling predominantly contributes to fog formation, compared to inland and coastal regions. As the fog generation mechanisms diversified, the tree-based ensemble models appeared to encounter challenges in delineating their decision boundaries. When they were trained with the reanalysis data, their predictive skills were significantly decreased, resulting in high false alarm rates. This prompted the need for post-processing techniques to rectify overestimated fog frequency. While post-processing may ameliorate overestimation, caution is needed to interpret the resultant fog frequency estimates, especially in regions with more diverse fog generation mechanisms. The spatial upscaling of machine-learning-based fog prediction models poses challenges owing to the intricate interplay of various fog formation mechanisms, data imbalances, and potential inaccuracies in reanalysis data.
期刊介绍:
Weather and Forecasting (WAF) (ISSN: 0882-8156; eISSN: 1520-0434) publishes research that is relevant to operational forecasting. This includes papers on significant weather events, forecasting techniques, forecast verification, model parameterizations, data assimilation, model ensembles, statistical postprocessing techniques, the transfer of research results to the forecasting community, and the societal use and value of forecasts. The scope of WAF includes research relevant to forecast lead times ranging from short-term “nowcasts” through seasonal time scales out to approximately two years.