中国广西热带气旋极端降雨预测:一个可解释的机器学习框架,解决类别不平衡和特征优化

IF 2.3 4区 地球科学 Q3 METEOROLOGY & ATMOSPHERIC SCIENCES
Yuexing Cai, Cuiyin Huang, Fengqin Zheng, Guangtao Li, Sheng Lai, Liyun Zhu, Qiuyu Zhu
{"title":"中国广西热带气旋极端降雨预测:一个可解释的机器学习框架,解决类别不平衡和特征优化","authors":"Yuexing Cai,&nbsp;Cuiyin Huang,&nbsp;Fengqin Zheng,&nbsp;Guangtao Li,&nbsp;Sheng Lai,&nbsp;Liyun Zhu,&nbsp;Qiuyu Zhu","doi":"10.1002/met.70052","DOIUrl":null,"url":null,"abstract":"<p>Accurate prediction of tropical cyclone-induced extreme rainfall (TCER) is of utmost importance for disaster mitigation in coastal regions. However, it remains a formidable challenge due to the intricate interactions among multi-scale meteorological factors and the inherent data imbalances. This study presented an interpretable machine learning (ML) framework aimed at predicting both the occurrence and magnitude of TCER in Guangxi (GX), China. The framework integrated three supervised learning algorithms, namely XGBoost, Random Forest, and AdaBoost, along with feature selection techniques and an explainable method. A total of 202 experiments were conducted to comprehensively evaluate the framework's performance. Genetic Algorithm (GA) optimization and Shapley additive explanations (SHAP) were utilized to identify the optimal subsets of features and accurately quantify the contributions of each variable. Results showed that the optimized XGBoost model exhibited outstanding performance, integrating 18 predictors across dynamic, thermodynamic, moisture, and precursor variables, with a Threat Score of 0.41 for the classification of TCER occurrence and a Threat Score of 0.49 for the regression of rainfall magnitude, outperforming the TIGGE ensemble data in case studies of typhoons Chaba (2022) and Doksuri (2023). SHAP analysis revealed that <i>Distance to Track</i> is the most crucial factor for TCER occurrence. It also unveiled the existence of nonlinear interactions. For instance, an increase in vertical wind shear, favorable thermal conditions, ascending motion, and subtropical high activity can substantially amplify the likelihood of TCER when coupled with low-level humidity accumulation. Moreover, time-lagged variables and time-evolution variables demonstrated their ability to capture the precursor signals of TCER events, like humidity accumulation, circulation adjustment, and typhoon intensity changes, highlighting the model's effectiveness in considering these factors. Therefore, this study showcases the great potential of ML in enhancing TCER prediction while maintaining physical interpretability. Additionally, it offers a valuable reference for addressing imbalance issues in similar research fields.</p>","PeriodicalId":49825,"journal":{"name":"Meteorological Applications","volume":"32 3","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/met.70052","citationCount":"0","resultStr":"{\"title\":\"Predicting Tropical Cyclone Extreme Rainfall in Guangxi, China: An Interpretable Machine Learning Framework Addressing Class Imbalance and Feature Optimization\",\"authors\":\"Yuexing Cai,&nbsp;Cuiyin Huang,&nbsp;Fengqin Zheng,&nbsp;Guangtao Li,&nbsp;Sheng Lai,&nbsp;Liyun Zhu,&nbsp;Qiuyu Zhu\",\"doi\":\"10.1002/met.70052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Accurate prediction of tropical cyclone-induced extreme rainfall (TCER) is of utmost importance for disaster mitigation in coastal regions. However, it remains a formidable challenge due to the intricate interactions among multi-scale meteorological factors and the inherent data imbalances. This study presented an interpretable machine learning (ML) framework aimed at predicting both the occurrence and magnitude of TCER in Guangxi (GX), China. The framework integrated three supervised learning algorithms, namely XGBoost, Random Forest, and AdaBoost, along with feature selection techniques and an explainable method. A total of 202 experiments were conducted to comprehensively evaluate the framework's performance. Genetic Algorithm (GA) optimization and Shapley additive explanations (SHAP) were utilized to identify the optimal subsets of features and accurately quantify the contributions of each variable. Results showed that the optimized XGBoost model exhibited outstanding performance, integrating 18 predictors across dynamic, thermodynamic, moisture, and precursor variables, with a Threat Score of 0.41 for the classification of TCER occurrence and a Threat Score of 0.49 for the regression of rainfall magnitude, outperforming the TIGGE ensemble data in case studies of typhoons Chaba (2022) and Doksuri (2023). SHAP analysis revealed that <i>Distance to Track</i> is the most crucial factor for TCER occurrence. It also unveiled the existence of nonlinear interactions. For instance, an increase in vertical wind shear, favorable thermal conditions, ascending motion, and subtropical high activity can substantially amplify the likelihood of TCER when coupled with low-level humidity accumulation. Moreover, time-lagged variables and time-evolution variables demonstrated their ability to capture the precursor signals of TCER events, like humidity accumulation, circulation adjustment, and typhoon intensity changes, highlighting the model's effectiveness in considering these factors. Therefore, this study showcases the great potential of ML in enhancing TCER prediction while maintaining physical interpretability. Additionally, it offers a valuable reference for addressing imbalance issues in similar research fields.</p>\",\"PeriodicalId\":49825,\"journal\":{\"name\":\"Meteorological Applications\",\"volume\":\"32 3\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/met.70052\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Meteorological Applications\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/met.70052\",\"RegionNum\":4,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"METEOROLOGY & ATMOSPHERIC SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meteorological Applications","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/met.70052","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

热带气旋极端降雨的准确预报对沿海地区减灾具有重要意义。然而,由于多尺度气象因子之间复杂的相互作用和固有的数据不平衡,这仍然是一个艰巨的挑战。本研究提出了一个可解释的机器学习(ML)框架,旨在预测中国广西(GX) TCER的发生和程度。该框架集成了三种监督学习算法,即XGBoost、Random Forest和AdaBoost,以及特征选择技术和可解释方法。总共进行了202次实验,以全面评估该框架的性能。利用遗传算法(GA)优化和Shapley加性解释(SHAP)来确定最优特征子集,并准确量化每个变量的贡献。结果表明,优化后的XGBoost模型综合了动力、热力、湿度和前兆变量等18个预测因子,TCER发生分类的威胁得分为0.41,降雨强度回归的威胁得分为0.49,优于TIGGE集合数据对台风Chaba(2022)和Doksuri(2023)的研究。SHAP分析显示,距迹距离是TCER发生的最关键因素。它还揭示了非线性相互作用的存在。例如,垂直风切变的增加、有利的热条件、上升运动和副热带高压活动,加上低层湿度积累,可以大大增加TCER的可能性。此外,时间滞后变量和时间演化变量能够捕捉到TCER事件的前兆信号,如湿度积累、环流调整和台风强度变化,突出了模式在考虑这些因素方面的有效性。因此,本研究展示了机器学习在保持物理可解释性的同时增强TCER预测的巨大潜力。同时也为解决类似研究领域的失衡问题提供了有价值的参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Predicting Tropical Cyclone Extreme Rainfall in Guangxi, China: An Interpretable Machine Learning Framework Addressing Class Imbalance and Feature Optimization

Predicting Tropical Cyclone Extreme Rainfall in Guangxi, China: An Interpretable Machine Learning Framework Addressing Class Imbalance and Feature Optimization

Accurate prediction of tropical cyclone-induced extreme rainfall (TCER) is of utmost importance for disaster mitigation in coastal regions. However, it remains a formidable challenge due to the intricate interactions among multi-scale meteorological factors and the inherent data imbalances. This study presented an interpretable machine learning (ML) framework aimed at predicting both the occurrence and magnitude of TCER in Guangxi (GX), China. The framework integrated three supervised learning algorithms, namely XGBoost, Random Forest, and AdaBoost, along with feature selection techniques and an explainable method. A total of 202 experiments were conducted to comprehensively evaluate the framework's performance. Genetic Algorithm (GA) optimization and Shapley additive explanations (SHAP) were utilized to identify the optimal subsets of features and accurately quantify the contributions of each variable. Results showed that the optimized XGBoost model exhibited outstanding performance, integrating 18 predictors across dynamic, thermodynamic, moisture, and precursor variables, with a Threat Score of 0.41 for the classification of TCER occurrence and a Threat Score of 0.49 for the regression of rainfall magnitude, outperforming the TIGGE ensemble data in case studies of typhoons Chaba (2022) and Doksuri (2023). SHAP analysis revealed that Distance to Track is the most crucial factor for TCER occurrence. It also unveiled the existence of nonlinear interactions. For instance, an increase in vertical wind shear, favorable thermal conditions, ascending motion, and subtropical high activity can substantially amplify the likelihood of TCER when coupled with low-level humidity accumulation. Moreover, time-lagged variables and time-evolution variables demonstrated their ability to capture the precursor signals of TCER events, like humidity accumulation, circulation adjustment, and typhoon intensity changes, highlighting the model's effectiveness in considering these factors. Therefore, this study showcases the great potential of ML in enhancing TCER prediction while maintaining physical interpretability. Additionally, it offers a valuable reference for addressing imbalance issues in similar research fields.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Meteorological Applications
Meteorological Applications 地学-气象与大气科学
CiteScore
5.70
自引率
3.70%
发文量
62
审稿时长
>12 weeks
期刊介绍: The aim of Meteorological Applications is to serve the needs of applied meteorologists, forecasters and users of meteorological services by publishing papers on all aspects of meteorological science, including: applications of meteorological, climatological, analytical and forecasting data, and their socio-economic benefits; forecasting, warning and service delivery techniques and methods; weather hazards, their analysis and prediction; performance, verification and value of numerical models and forecasting services; practical applications of ocean and climate models; education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信