{"title":"竞争预测验证:用功率发散统计量检验“更好”的频率","authors":"E. Gilleland, D. Muñoz‐Esparza, David D. Turner","doi":"10.1175/waf-d-22-0201.1","DOIUrl":null,"url":null,"abstract":"\nWhen testing hypotheses about which of two competing models is better, say A and B, the difference is often not significant. An alternative, complementary approach, is to measure how often model A is better than model B regardless of how slight or large the difference. The hypothesis concerns whether or not the percentage of time that model A is better than model B is larger than 50%. One generalized test statistic that can be used is the power-divergence test, which encompasses many familiar goodness-of-fit test statistics, such as the loglikelihood-ratio and Pearson X2 tests. Theoretical results justify using the distribution for the entire family of test statistics, where k is the number of categories. However, these results assume that the underlying data are independent and identically distributed; which is often violated. Empirical results demonstrate that the reduction to two categories (i.e., model A is better than model B v. model B is better than A) results in a test that is reasonably robust to even severe departures from temporal independence, as well as contemporaneous correlation. The test is demonstrated on two different example verification sets: 6-h forecasts of eddy dissipation rate (m2/3s−1) from two versions of the Graphical Turbulence Guidence model and for 12-hour forecasts of 2-m temperature (°C) and 10-m wind speed (ms−1) from two versions of the High-Resolution Rapid Refresh model. The novelty of this paper is in demonstrating the utility of the power-divergence statistic in the face of temporally dependent data, as well as the emphasis on testing for the “frequency-of-better” alongside more traditional measures.","PeriodicalId":49369,"journal":{"name":"Weather and Forecasting","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Competing Forecast Verification: Using the Power-Divergence Statistic for Testing the Frequency of “Better”\",\"authors\":\"E. Gilleland, D. Muñoz‐Esparza, David D. Turner\",\"doi\":\"10.1175/waf-d-22-0201.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nWhen testing hypotheses about which of two competing models is better, say A and B, the difference is often not significant. An alternative, complementary approach, is to measure how often model A is better than model B regardless of how slight or large the difference. The hypothesis concerns whether or not the percentage of time that model A is better than model B is larger than 50%. One generalized test statistic that can be used is the power-divergence test, which encompasses many familiar goodness-of-fit test statistics, such as the loglikelihood-ratio and Pearson X2 tests. Theoretical results justify using the distribution for the entire family of test statistics, where k is the number of categories. However, these results assume that the underlying data are independent and identically distributed; which is often violated. Empirical results demonstrate that the reduction to two categories (i.e., model A is better than model B v. model B is better than A) results in a test that is reasonably robust to even severe departures from temporal independence, as well as contemporaneous correlation. The test is demonstrated on two different example verification sets: 6-h forecasts of eddy dissipation rate (m2/3s−1) from two versions of the Graphical Turbulence Guidence model and for 12-hour forecasts of 2-m temperature (°C) and 10-m wind speed (ms−1) from two versions of the High-Resolution Rapid Refresh model. The novelty of this paper is in demonstrating the utility of the power-divergence statistic in the face of temporally dependent data, as well as the emphasis on testing for the “frequency-of-better” alongside more traditional measures.\",\"PeriodicalId\":49369,\"journal\":{\"name\":\"Weather and Forecasting\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Weather and Forecasting\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1175/waf-d-22-0201.1\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"METEOROLOGY & ATMOSPHERIC SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Weather and Forecasting","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1175/waf-d-22-0201.1","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
Competing Forecast Verification: Using the Power-Divergence Statistic for Testing the Frequency of “Better”
When testing hypotheses about which of two competing models is better, say A and B, the difference is often not significant. An alternative, complementary approach, is to measure how often model A is better than model B regardless of how slight or large the difference. The hypothesis concerns whether or not the percentage of time that model A is better than model B is larger than 50%. One generalized test statistic that can be used is the power-divergence test, which encompasses many familiar goodness-of-fit test statistics, such as the loglikelihood-ratio and Pearson X2 tests. Theoretical results justify using the distribution for the entire family of test statistics, where k is the number of categories. However, these results assume that the underlying data are independent and identically distributed; which is often violated. Empirical results demonstrate that the reduction to two categories (i.e., model A is better than model B v. model B is better than A) results in a test that is reasonably robust to even severe departures from temporal independence, as well as contemporaneous correlation. The test is demonstrated on two different example verification sets: 6-h forecasts of eddy dissipation rate (m2/3s−1) from two versions of the Graphical Turbulence Guidence model and for 12-hour forecasts of 2-m temperature (°C) and 10-m wind speed (ms−1) from two versions of the High-Resolution Rapid Refresh model. The novelty of this paper is in demonstrating the utility of the power-divergence statistic in the face of temporally dependent data, as well as the emphasis on testing for the “frequency-of-better” alongside more traditional measures.
期刊介绍:
Weather and Forecasting (WAF) (ISSN: 0882-8156; eISSN: 1520-0434) publishes research that is relevant to operational forecasting. This includes papers on significant weather events, forecasting techniques, forecast verification, model parameterizations, data assimilation, model ensembles, statistical postprocessing techniques, the transfer of research results to the forecasting community, and the societal use and value of forecasts. The scope of WAF includes research relevant to forecast lead times ranging from short-term “nowcasts” through seasonal time scales out to approximately two years.