{"title":"中国广西热带气旋极端降雨预测:一个可解释的机器学习框架,解决类别不平衡和特征优化","authors":"Yuexing Cai, Cuiyin Huang, Fengqin Zheng, Guangtao Li, Sheng Lai, Liyun Zhu, Qiuyu Zhu","doi":"10.1002/met.70052","DOIUrl":null,"url":null,"abstract":"<p>Accurate prediction of tropical cyclone-induced extreme rainfall (TCER) is of utmost importance for disaster mitigation in coastal regions. However, it remains a formidable challenge due to the intricate interactions among multi-scale meteorological factors and the inherent data imbalances. This study presented an interpretable machine learning (ML) framework aimed at predicting both the occurrence and magnitude of TCER in Guangxi (GX), China. The framework integrated three supervised learning algorithms, namely XGBoost, Random Forest, and AdaBoost, along with feature selection techniques and an explainable method. A total of 202 experiments were conducted to comprehensively evaluate the framework's performance. Genetic Algorithm (GA) optimization and Shapley additive explanations (SHAP) were utilized to identify the optimal subsets of features and accurately quantify the contributions of each variable. Results showed that the optimized XGBoost model exhibited outstanding performance, integrating 18 predictors across dynamic, thermodynamic, moisture, and precursor variables, with a Threat Score of 0.41 for the classification of TCER occurrence and a Threat Score of 0.49 for the regression of rainfall magnitude, outperforming the TIGGE ensemble data in case studies of typhoons Chaba (2022) and Doksuri (2023). SHAP analysis revealed that <i>Distance to Track</i> is the most crucial factor for TCER occurrence. It also unveiled the existence of nonlinear interactions. For instance, an increase in vertical wind shear, favorable thermal conditions, ascending motion, and subtropical high activity can substantially amplify the likelihood of TCER when coupled with low-level humidity accumulation. Moreover, time-lagged variables and time-evolution variables demonstrated their ability to capture the precursor signals of TCER events, like humidity accumulation, circulation adjustment, and typhoon intensity changes, highlighting the model's effectiveness in considering these factors. Therefore, this study showcases the great potential of ML in enhancing TCER prediction while maintaining physical interpretability. Additionally, it offers a valuable reference for addressing imbalance issues in similar research fields.</p>","PeriodicalId":49825,"journal":{"name":"Meteorological Applications","volume":"32 3","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/met.70052","citationCount":"0","resultStr":"{\"title\":\"Predicting Tropical Cyclone Extreme Rainfall in Guangxi, China: An Interpretable Machine Learning Framework Addressing Class Imbalance and Feature Optimization\",\"authors\":\"Yuexing Cai, Cuiyin Huang, Fengqin Zheng, Guangtao Li, Sheng Lai, Liyun Zhu, Qiuyu Zhu\",\"doi\":\"10.1002/met.70052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Accurate prediction of tropical cyclone-induced extreme rainfall (TCER) is of utmost importance for disaster mitigation in coastal regions. However, it remains a formidable challenge due to the intricate interactions among multi-scale meteorological factors and the inherent data imbalances. This study presented an interpretable machine learning (ML) framework aimed at predicting both the occurrence and magnitude of TCER in Guangxi (GX), China. The framework integrated three supervised learning algorithms, namely XGBoost, Random Forest, and AdaBoost, along with feature selection techniques and an explainable method. A total of 202 experiments were conducted to comprehensively evaluate the framework's performance. Genetic Algorithm (GA) optimization and Shapley additive explanations (SHAP) were utilized to identify the optimal subsets of features and accurately quantify the contributions of each variable. Results showed that the optimized XGBoost model exhibited outstanding performance, integrating 18 predictors across dynamic, thermodynamic, moisture, and precursor variables, with a Threat Score of 0.41 for the classification of TCER occurrence and a Threat Score of 0.49 for the regression of rainfall magnitude, outperforming the TIGGE ensemble data in case studies of typhoons Chaba (2022) and Doksuri (2023). SHAP analysis revealed that <i>Distance to Track</i> is the most crucial factor for TCER occurrence. It also unveiled the existence of nonlinear interactions. For instance, an increase in vertical wind shear, favorable thermal conditions, ascending motion, and subtropical high activity can substantially amplify the likelihood of TCER when coupled with low-level humidity accumulation. Moreover, time-lagged variables and time-evolution variables demonstrated their ability to capture the precursor signals of TCER events, like humidity accumulation, circulation adjustment, and typhoon intensity changes, highlighting the model's effectiveness in considering these factors. Therefore, this study showcases the great potential of ML in enhancing TCER prediction while maintaining physical interpretability. Additionally, it offers a valuable reference for addressing imbalance issues in similar research fields.</p>\",\"PeriodicalId\":49825,\"journal\":{\"name\":\"Meteorological Applications\",\"volume\":\"32 3\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/met.70052\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Meteorological Applications\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/met.70052\",\"RegionNum\":4,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"METEOROLOGY & ATMOSPHERIC SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meteorological Applications","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/met.70052","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
Predicting Tropical Cyclone Extreme Rainfall in Guangxi, China: An Interpretable Machine Learning Framework Addressing Class Imbalance and Feature Optimization
Accurate prediction of tropical cyclone-induced extreme rainfall (TCER) is of utmost importance for disaster mitigation in coastal regions. However, it remains a formidable challenge due to the intricate interactions among multi-scale meteorological factors and the inherent data imbalances. This study presented an interpretable machine learning (ML) framework aimed at predicting both the occurrence and magnitude of TCER in Guangxi (GX), China. The framework integrated three supervised learning algorithms, namely XGBoost, Random Forest, and AdaBoost, along with feature selection techniques and an explainable method. A total of 202 experiments were conducted to comprehensively evaluate the framework's performance. Genetic Algorithm (GA) optimization and Shapley additive explanations (SHAP) were utilized to identify the optimal subsets of features and accurately quantify the contributions of each variable. Results showed that the optimized XGBoost model exhibited outstanding performance, integrating 18 predictors across dynamic, thermodynamic, moisture, and precursor variables, with a Threat Score of 0.41 for the classification of TCER occurrence and a Threat Score of 0.49 for the regression of rainfall magnitude, outperforming the TIGGE ensemble data in case studies of typhoons Chaba (2022) and Doksuri (2023). SHAP analysis revealed that Distance to Track is the most crucial factor for TCER occurrence. It also unveiled the existence of nonlinear interactions. For instance, an increase in vertical wind shear, favorable thermal conditions, ascending motion, and subtropical high activity can substantially amplify the likelihood of TCER when coupled with low-level humidity accumulation. Moreover, time-lagged variables and time-evolution variables demonstrated their ability to capture the precursor signals of TCER events, like humidity accumulation, circulation adjustment, and typhoon intensity changes, highlighting the model's effectiveness in considering these factors. Therefore, this study showcases the great potential of ML in enhancing TCER prediction while maintaining physical interpretability. Additionally, it offers a valuable reference for addressing imbalance issues in similar research fields.
期刊介绍:
The aim of Meteorological Applications is to serve the needs of applied meteorologists, forecasters and users of meteorological services by publishing papers on all aspects of meteorological science, including:
applications of meteorological, climatological, analytical and forecasting data, and their socio-economic benefits;
forecasting, warning and service delivery techniques and methods;
weather hazards, their analysis and prediction;
performance, verification and value of numerical models and forecasting services;
practical applications of ocean and climate models;
education and training.