结构化专家判断的连续分布和统计准确性度量

FUTURES & FORESIGHT SCIENCE Pub Date : 2025-05-05 DOI:10.1002/ffo2.70009

Guus Rongen, Gabriela F. Nane, Oswaldo Morales-Napoles, Roger M. Cooke

{"title":"结构化专家判断的连续分布和统计准确性度量","authors":"Guus Rongen, Gabriela F. Nane, Oswaldo Morales-Napoles, Roger M. Cooke","doi":"10.1002/ffo2.70009","DOIUrl":null,"url":null,"abstract":"This study evaluates five scoring rules, or measures of statistical accuracy, for assessing uncertainty estimates from expert judgment studies and model forecasts. These rules — the Continuously Ranked Probability Score (<math>\n <semantics>\n <mrow>\n <mi>CRPS</mi>\n </mrow>\n <annotation> ${CRPS}$</annotation>\n </semantics></math>), Kolmogorov-Smirnov (<math>\n <semantics>\n <mrow>\n <mi>KS</mi>\n </mrow>\n <annotation> ${KS}$</annotation>\n </semantics></math>), Cramer-von Mises (<math>\n <semantics>\n <mrow>\n <mi>CvM</mi>\n </mrow>\n <annotation> ${CvM}$</annotation>\n </semantics></math>), Anderson Darling (<math>\n <semantics>\n <mrow>\n <mi>AD</mi>\n </mrow>\n <annotation> ${AD}$</annotation>\n </semantics></math>), and chi-square test — were applied to 6864 expert uncertainty estimates from 49 Classical Model (CM) studies. We compared their sensitivity to various biases and their ability to serve as performance-based weight for expert estimates. Additionally, the piecewise uniform and Metalog distribution were evaluated for their representation of expert estimates because four of the five rules require interpolating the experts' estimates. Simulating biased estimates reveals varying sensitivity of the considered test statistics to these biases. Expert weights derived using one measure of statistical accuracy were evaluated with other measures to assess their performance. The main conclusions are (1) <math>\n <semantics>\n <mrow>\n <mi>CRPS</mi>\n </mrow>\n <annotation> ${CRPS}$</annotation>\n </semantics></math> overlooks important biases, while chi-square and <math>\n <semantics>\n <mrow>\n <mi>AD</mi>\n </mrow>\n <annotation> ${AD}$</annotation>\n </semantics></math> behave similarly, as do <math>\n <semantics>\n <mrow>\n <mi>KS</mi>\n </mrow>\n <annotation> ${KS}$</annotation>\n </semantics></math> and <math>\n <semantics>\n <mrow>\n <mi>CvM</mi>\n </mrow>\n <annotation> ${CvM}$</annotation>\n </semantics></math>. (2) All measures except <math>\n <semantics>\n <mrow>\n <mi>CRPS</mi>\n </mrow>\n <annotation> ${CRPS}$</annotation>\n </semantics></math> agree that performance weighting is superior to equal weighting with respect to statistical accuracy. (3) Neither distributions can effectively predict the position of a removed quantile estimate. These insights show the behavior of different scoring rules for combining uncertainty estimates from expert or models, and extent the knowledge for best-practices.","PeriodicalId":100567,"journal":{"name":"FUTURES & FORESIGHT SCIENCE","volume":"7 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ffo2.70009","citationCount":"0","resultStr":"{\"title\":\"Continuous Distributions and Measures of Statistical Accuracy for Structured Expert Judgment\",\"authors\":\"Guus Rongen, Gabriela F. Nane, Oswaldo Morales-Napoles, Roger M. Cooke\",\"doi\":\"10.1002/ffo2.70009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study evaluates five scoring rules, or measures of statistical accuracy, for assessing uncertainty estimates from expert judgment studies and model forecasts. These rules — the Continuously Ranked Probability Score (<math>\\n <semantics>\\n <mrow>\\n <mi>CRPS</mi>\\n </mrow>\\n <annotation> ${CRPS}$</annotation>\\n </semantics></math>), Kolmogorov-Smirnov (<math>\\n <semantics>\\n <mrow>\\n <mi>KS</mi>\\n </mrow>\\n <annotation> ${KS}$</annotation>\\n </semantics></math>), Cramer-von Mises (<math>\\n <semantics>\\n <mrow>\\n <mi>CvM</mi>\\n </mrow>\\n <annotation> ${CvM}$</annotation>\\n </semantics></math>), Anderson Darling (<math>\\n <semantics>\\n <mrow>\\n <mi>AD</mi>\\n </mrow>\\n <annotation> ${AD}$</annotation>\\n </semantics></math>), and chi-square test — were applied to 6864 expert uncertainty estimates from 49 Classical Model (CM) studies. We compared their sensitivity to various biases and their ability to serve as performance-based weight for expert estimates. Additionally, the piecewise uniform and Metalog distribution were evaluated for their representation of expert estimates because four of the five rules require interpolating the experts' estimates. Simulating biased estimates reveals varying sensitivity of the considered test statistics to these biases. Expert weights derived using one measure of statistical accuracy were evaluated with other measures to assess their performance. The main conclusions are (1) <math>\\n <semantics>\\n <mrow>\\n <mi>CRPS</mi>\\n </mrow>\\n <annotation> ${CRPS}$</annotation>\\n </semantics></math> overlooks important biases, while chi-square and <math>\\n <semantics>\\n <mrow>\\n <mi>AD</mi>\\n </mrow>\\n <annotation> ${AD}$</annotation>\\n </semantics></math> behave similarly, as do <math>\\n <semantics>\\n <mrow>\\n <mi>KS</mi>\\n </mrow>\\n <annotation> ${KS}$</annotation>\\n </semantics></math> and <math>\\n <semantics>\\n <mrow>\\n <mi>CvM</mi>\\n </mrow>\\n <annotation> ${CvM}$</annotation>\\n </semantics></math>. (2) All measures except <math>\\n <semantics>\\n <mrow>\\n <mi>CRPS</mi>\\n </mrow>\\n <annotation> ${CRPS}$</annotation>\\n </semantics></math> agree that performance weighting is superior to equal weighting with respect to statistical accuracy. (3) Neither distributions can effectively predict the position of a removed quantile estimate. These insights show the behavior of different scoring rules for combining uncertainty estimates from expert or models, and extent the knowledge for best-practices.\",\"PeriodicalId\":100567,\"journal\":{\"name\":\"FUTURES & FORESIGHT SCIENCE\",\"volume\":\"7 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ffo2.70009\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"FUTURES & FORESIGHT SCIENCE\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ffo2.70009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"FUTURES & FORESIGHT SCIENCE","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ffo2.70009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本研究评估了五种评分规则，或统计准确性措施，用于评估专家判断研究和模型预测的不确定性估计。这些规则——连续排序概率评分（CRPS ${CRPS}$）， Kolmogorov-Smirnov (KS ${KS}$), Cramer-von Mises (CvM ${CvM}$)，采用Anderson Darling （AD ${AD}$）和卡方检验-对49个经典模型（CM）研究的6864个专家不确定性估计进行了分析。我们比较了它们对各种偏差的敏感性，以及它们作为专家估计的基于性能的权重的能力。此外，由于五个规则中有四个规则需要插值专家的估计，因此评估了分段均匀分布和Metalog分布对专家估计的表示。模拟有偏估计揭示了考虑的测试统计量对这些偏差的不同敏感性。使用一种统计准确性度量得出的专家权重与其他度量一起评估其性能。主要结论是：(1)CRPS ${CRPS}$忽略了重要的偏差，而卡方和AD ${AD}$的行为相似；KS ${KS}$和CvM ${CvM}$也是如此。(2)除CRPS ${CRPS}$外的所有度量均同意，在统计准确性方面，绩效加权优于相等加权。(3)两种分布都不能有效地预测去除的分位数估计的位置。这些见解显示了不同的评分规则的行为，以结合专家或模型的不确定性估计，并扩展了最佳实践的知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Continuous Distributions and Measures of Statistical Accuracy for Structured Expert Judgment

查看原文本刊更多论文

Continuous Distributions and Measures of Statistical Accuracy for Structured Expert Judgment

This study evaluates five scoring rules, or measures of statistical accuracy, for assessing uncertainty estimates from expert judgment studies and model forecasts. These rules — the Continuously Ranked Probability Score ( $CRPS$ ), Kolmogorov-Smirnov ( $KS$ ), Cramer-von Mises ( $CvM$ ), Anderson Darling ( $AD$ ), and chi-square test — were applied to 6864 expert uncertainty estimates from 49 Classical Model (CM) studies. We compared their sensitivity to various biases and their ability to serve as performance-based weight for expert estimates. Additionally, the piecewise uniform and Metalog distribution were evaluated for their representation of expert estimates because four of the five rules require interpolating the experts' estimates. Simulating biased estimates reveals varying sensitivity of the considered test statistics to these biases. Expert weights derived using one measure of statistical accuracy were evaluated with other measures to assess their performance. The main conclusions are (1) $CRPS$ overlooks important biases, while chi-square and $AD$ behave similarly, as do $KS$ and $CvM$ . (2) All measures except $CRPS$ agree that performance weighting is superior to equal weighting with respect to statistical accuracy. (3) Neither distributions can effectively predict the position of a removed quantile estimate. These insights show the behavior of different scoring rules for combining uncertainty estimates from expert or models, and extent the knowledge for best-practices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

FUTURES & FORESIGHT SCIENCE

CiteScore

7.00

自引率

0.00%

发文量