Yasmin Uchôa da Silva , Gutemberg Borges França , Heloisa Musetti Ruivo , Haroldo Fraga de Campos Velho
{"title":"通过混合模型预测对流事件:WRF和机器学习算法","authors":"Yasmin Uchôa da Silva , Gutemberg Borges França , Heloisa Musetti Ruivo , Haroldo Fraga de Campos Velho","doi":"10.1016/j.acags.2022.100099","DOIUrl":null,"url":null,"abstract":"<div><p>This presents a novel hybrid 24-h forecasting model of convective weather events based on numerical simulation and machine learning algorithms. To characterize the convective events, 13-year from 2008 up to 2020 of precipitation data from the main airport stations in Rio de Janeiro, Brazil, and atmospheric discharges from the surrounding area of around 150 km are investigated. The Weather Research and Forecasting (WRF) model was used to numerically simulate atmospheric conditions for every day in February, as it is the month with the greatest daily rate of atmospheric discharge for the data period. The p-value hypothesis test (with <span><math><mrow><mi>α</mi><mspace></mspace><mo>=</mo><mspace></mspace><mn>0.05</mn></mrow></math></span>) was applied to each grid point of the numerically predicted variables (defined as an independent attribute) to find those most associated with convective events using the output of the 3-D WRF grid. This one identified 36 attributes (or predictors) that were used as input in the machine learning algorithms' training-test process in this study. Several cross-validation training and testing experiments were carried out using the nine-selected categorical machine learning algorithms and the 36 defined predictors. After applying the boosting technique to the nine previously trained-tested algorithms, the results of the 24-h predictions of convective occurrences were deemed satisfactory. The RandomForest method produced the best results, with statistics values close to perfection, such as POD = 1.00, FAR = 0.02, and CSI = 0.98. The 24-h hindcast utilizing the nine algorithms for the 28 days of February 2019 was very encouraging because it was able to almost recreate the maturation phase of events and their eventual failures were noted during the formation and dissipation phases. The best and worst 24-h hindcast had POD = 0.97 and 0.88, FAR = 0.02 and 0.12, and CSI = 0.94 and 0.78, respectively.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"16 ","pages":"Article 100099"},"PeriodicalIF":2.6000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197422000210/pdfft?md5=1d2ce0355dabd75829d084b9b8de2eaf&pid=1-s2.0-S2590197422000210-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Forecast of convective events via hybrid model: WRF and machine learning algorithms\",\"authors\":\"Yasmin Uchôa da Silva , Gutemberg Borges França , Heloisa Musetti Ruivo , Haroldo Fraga de Campos Velho\",\"doi\":\"10.1016/j.acags.2022.100099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This presents a novel hybrid 24-h forecasting model of convective weather events based on numerical simulation and machine learning algorithms. To characterize the convective events, 13-year from 2008 up to 2020 of precipitation data from the main airport stations in Rio de Janeiro, Brazil, and atmospheric discharges from the surrounding area of around 150 km are investigated. The Weather Research and Forecasting (WRF) model was used to numerically simulate atmospheric conditions for every day in February, as it is the month with the greatest daily rate of atmospheric discharge for the data period. The p-value hypothesis test (with <span><math><mrow><mi>α</mi><mspace></mspace><mo>=</mo><mspace></mspace><mn>0.05</mn></mrow></math></span>) was applied to each grid point of the numerically predicted variables (defined as an independent attribute) to find those most associated with convective events using the output of the 3-D WRF grid. This one identified 36 attributes (or predictors) that were used as input in the machine learning algorithms' training-test process in this study. Several cross-validation training and testing experiments were carried out using the nine-selected categorical machine learning algorithms and the 36 defined predictors. After applying the boosting technique to the nine previously trained-tested algorithms, the results of the 24-h predictions of convective occurrences were deemed satisfactory. The RandomForest method produced the best results, with statistics values close to perfection, such as POD = 1.00, FAR = 0.02, and CSI = 0.98. The 24-h hindcast utilizing the nine algorithms for the 28 days of February 2019 was very encouraging because it was able to almost recreate the maturation phase of events and their eventual failures were noted during the formation and dissipation phases. The best and worst 24-h hindcast had POD = 0.97 and 0.88, FAR = 0.02 and 0.12, and CSI = 0.94 and 0.78, respectively.</p></div>\",\"PeriodicalId\":33804,\"journal\":{\"name\":\"Applied Computing and Geosciences\",\"volume\":\"16 \",\"pages\":\"Article 100099\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2590197422000210/pdfft?md5=1d2ce0355dabd75829d084b9b8de2eaf&pid=1-s2.0-S2590197422000210-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing and Geosciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590197422000210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197422000210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Forecast of convective events via hybrid model: WRF and machine learning algorithms
This presents a novel hybrid 24-h forecasting model of convective weather events based on numerical simulation and machine learning algorithms. To characterize the convective events, 13-year from 2008 up to 2020 of precipitation data from the main airport stations in Rio de Janeiro, Brazil, and atmospheric discharges from the surrounding area of around 150 km are investigated. The Weather Research and Forecasting (WRF) model was used to numerically simulate atmospheric conditions for every day in February, as it is the month with the greatest daily rate of atmospheric discharge for the data period. The p-value hypothesis test (with ) was applied to each grid point of the numerically predicted variables (defined as an independent attribute) to find those most associated with convective events using the output of the 3-D WRF grid. This one identified 36 attributes (or predictors) that were used as input in the machine learning algorithms' training-test process in this study. Several cross-validation training and testing experiments were carried out using the nine-selected categorical machine learning algorithms and the 36 defined predictors. After applying the boosting technique to the nine previously trained-tested algorithms, the results of the 24-h predictions of convective occurrences were deemed satisfactory. The RandomForest method produced the best results, with statistics values close to perfection, such as POD = 1.00, FAR = 0.02, and CSI = 0.98. The 24-h hindcast utilizing the nine algorithms for the 28 days of February 2019 was very encouraging because it was able to almost recreate the maturation phase of events and their eventual failures were noted during the formation and dissipation phases. The best and worst 24-h hindcast had POD = 0.97 and 0.88, FAR = 0.02 and 0.12, and CSI = 0.94 and 0.78, respectively.