Performance of decision-tree-based ensemble classifiers in predicting fog frequency in ungauged areas

IF 3.1 3区地球科学 Q2 METEOROLOGY & ATMOSPHERIC SCIENCES

Weather and Forecasting Pub Date : 2023-10-09 DOI:10.1175/waf-d-23-0024.1

Daeha Kim, Eunhee Kim, Eunji Kim

{"title":"Performance of decision-tree-based ensemble classifiers in predicting fog frequency in ungauged areas","authors":"Daeha Kim, Eunhee Kim, Eunji Kim","doi":"10.1175/waf-d-23-0024.1","DOIUrl":null,"url":null,"abstract":"Abstract Fog is a phenomenon that exerts significant impacts on transportation, aviation, air quality, agriculture, and even water resources. While data-driven machine learning algorithms have shown promising performance in capturing non-linear fog events at point locations, their applicability to different areas and time periods is questionable. This study addresses this issue by examining five decision-tree-based classifiers in a South Korean region, where diverse fog formation mechanisms are at play. The five machine learning algorithms were trained at point locations, and tested with other point locations for time periods independent of the training processes. Using the ensemble classifiers and high-resolution atmospheric reanalysis data, we also attempted to establish fog occurrence maps in a regional area. Results showed that machine learning models trained on the local datasets exhibited superior performance in mountainous areas, where radiative cooling predominantly contributes to fog formation, compared to inland and coastal regions. As the fog generation mechanisms diversified, the tree-based ensemble models appeared to encounter challenges in delineating their decision boundaries. When they were trained with the reanalysis data, their predictive skills were significantly decreased, resulting in high false alarm rates. This prompted the need for post-processing techniques to rectify overestimated fog frequency. While post-processing may ameliorate overestimation, caution is needed to interpret the resultant fog frequency estimates, especially in regions with more diverse fog generation mechanisms. The spatial upscaling of machine-learning-based fog prediction models poses challenges owing to the intricate interplay of various fog formation mechanisms, data imbalances, and potential inaccuracies in reanalysis data.","PeriodicalId":49369,"journal":{"name":"Weather and Forecasting","volume":"46 1","pages":"0"},"PeriodicalIF":3.1000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Weather and Forecasting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1175/waf-d-23-0024.1","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Fog is a phenomenon that exerts significant impacts on transportation, aviation, air quality, agriculture, and even water resources. While data-driven machine learning algorithms have shown promising performance in capturing non-linear fog events at point locations, their applicability to different areas and time periods is questionable. This study addresses this issue by examining five decision-tree-based classifiers in a South Korean region, where diverse fog formation mechanisms are at play. The five machine learning algorithms were trained at point locations, and tested with other point locations for time periods independent of the training processes. Using the ensemble classifiers and high-resolution atmospheric reanalysis data, we also attempted to establish fog occurrence maps in a regional area. Results showed that machine learning models trained on the local datasets exhibited superior performance in mountainous areas, where radiative cooling predominantly contributes to fog formation, compared to inland and coastal regions. As the fog generation mechanisms diversified, the tree-based ensemble models appeared to encounter challenges in delineating their decision boundaries. When they were trained with the reanalysis data, their predictive skills were significantly decreased, resulting in high false alarm rates. This prompted the need for post-processing techniques to rectify overestimated fog frequency. While post-processing may ameliorate overestimation, caution is needed to interpret the resultant fog frequency estimates, especially in regions with more diverse fog generation mechanisms. The spatial upscaling of machine-learning-based fog prediction models poses challenges owing to the intricate interplay of various fog formation mechanisms, data imbalances, and potential inaccuracies in reanalysis data.

查看原文本刊更多论文

基于决策树的集成分类器在非测量区域雾频率预测中的性能

雾是一种对交通、航空、空气质量、农业甚至水资源产生重大影响的现象。虽然数据驱动的机器学习算法在捕获点位置的非线性雾事件方面表现出了很好的性能，但它们对不同区域和时间段的适用性值得怀疑。本研究通过检查韩国地区的五个基于决策树的分类器来解决这个问题，在韩国地区，不同的雾形成机制在起作用。这五种机器学习算法在点位置进行训练，并在独立于训练过程的时间段内与其他点位置进行测试。利用集合分类器和高分辨率大气再分析数据，我们还尝试建立了区域内的雾发生图。结果表明，与内陆和沿海地区相比，在本地数据集上训练的机器学习模型在辐射冷却主要导致雾形成的山区表现出优越的性能。随着雾产生机制的多样化，基于树的集成模型在划定决策边界方面遇到了挑战。当他们接受再分析数据训练时，他们的预测能力明显下降，导致误报率很高。这促使需要后处理技术来纠正高估的雾频率。虽然后处理可以改善高估，但需要谨慎解释由此产生的雾频率估计，特别是在雾产生机制更多样化的地区。由于各种雾形成机制的复杂相互作用、数据不平衡以及再分析数据中的潜在不准确性，基于机器学习的雾预测模型的空间升级提出了挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Weather and Forecasting 地学-气象与大气科学

CiteScore

5.20

自引率

17.20%

发文量

131

审稿时长

6-12 weeks

期刊介绍： Weather and Forecasting (WAF) (ISSN: 0882-8156; eISSN: 1520-0434) publishes research that is relevant to operational forecasting. This includes papers on significant weather events, forecasting techniques, forecast verification, model parameterizations, data assimilation, model ensembles, statistical postprocessing techniques, the transfer of research results to the forecasting community, and the societal use and value of forecasts. The scope of WAF includes research relevant to forecast lead times ranging from short-term “nowcasts” through seasonal time scales out to approximately two years.