Guus Rongen, Gabriela F. Nane, Oswaldo Morales-Napoles, Roger M. Cooke
{"title":"结构化专家判断的连续分布和统计准确性度量","authors":"Guus Rongen, Gabriela F. Nane, Oswaldo Morales-Napoles, Roger M. Cooke","doi":"10.1002/ffo2.70009","DOIUrl":null,"url":null,"abstract":"<p>This study evaluates five scoring rules, or measures of statistical accuracy, for assessing uncertainty estimates from expert judgment studies and model forecasts. These rules — the Continuously Ranked Probability Score (<span></span><math>\n <semantics>\n <mrow>\n <mi>CRPS</mi>\n </mrow>\n <annotation> ${CRPS}$</annotation>\n </semantics></math>), Kolmogorov-Smirnov (<span></span><math>\n <semantics>\n <mrow>\n <mi>KS</mi>\n </mrow>\n <annotation> ${KS}$</annotation>\n </semantics></math>), Cramer-von Mises (<span></span><math>\n <semantics>\n <mrow>\n <mi>CvM</mi>\n </mrow>\n <annotation> ${CvM}$</annotation>\n </semantics></math>), Anderson Darling (<span></span><math>\n <semantics>\n <mrow>\n <mi>AD</mi>\n </mrow>\n <annotation> ${AD}$</annotation>\n </semantics></math>), and chi-square test — were applied to 6864 expert uncertainty estimates from 49 Classical Model (CM) studies. We compared their sensitivity to various biases and their ability to serve as performance-based weight for expert estimates. Additionally, the piecewise uniform and Metalog distribution were evaluated for their representation of expert estimates because four of the five rules require interpolating the experts' estimates. Simulating biased estimates reveals varying sensitivity of the considered test statistics to these biases. Expert weights derived using one measure of statistical accuracy were evaluated with other measures to assess their performance. The main conclusions are (1) <span></span><math>\n <semantics>\n <mrow>\n <mi>CRPS</mi>\n </mrow>\n <annotation> ${CRPS}$</annotation>\n </semantics></math> overlooks important biases, while chi-square and <span></span><math>\n <semantics>\n <mrow>\n <mi>AD</mi>\n </mrow>\n <annotation> ${AD}$</annotation>\n </semantics></math> behave similarly, as do <span></span><math>\n <semantics>\n <mrow>\n <mi>KS</mi>\n </mrow>\n <annotation> ${KS}$</annotation>\n </semantics></math> and <span></span><math>\n <semantics>\n <mrow>\n <mi>CvM</mi>\n </mrow>\n <annotation> ${CvM}$</annotation>\n </semantics></math>. (2) All measures except <span></span><math>\n <semantics>\n <mrow>\n <mi>CRPS</mi>\n </mrow>\n <annotation> ${CRPS}$</annotation>\n </semantics></math> agree that performance weighting is superior to equal weighting with respect to statistical accuracy. (3) Neither distributions can effectively predict the position of a removed quantile estimate. These insights show the behavior of different scoring rules for combining uncertainty estimates from expert or models, and extent the knowledge for best-practices.</p>","PeriodicalId":100567,"journal":{"name":"FUTURES & FORESIGHT SCIENCE","volume":"7 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ffo2.70009","citationCount":"0","resultStr":"{\"title\":\"Continuous Distributions and Measures of Statistical Accuracy for Structured Expert Judgment\",\"authors\":\"Guus Rongen, Gabriela F. Nane, Oswaldo Morales-Napoles, Roger M. Cooke\",\"doi\":\"10.1002/ffo2.70009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This study evaluates five scoring rules, or measures of statistical accuracy, for assessing uncertainty estimates from expert judgment studies and model forecasts. These rules — the Continuously Ranked Probability Score (<span></span><math>\\n <semantics>\\n <mrow>\\n <mi>CRPS</mi>\\n </mrow>\\n <annotation> ${CRPS}$</annotation>\\n </semantics></math>), Kolmogorov-Smirnov (<span></span><math>\\n <semantics>\\n <mrow>\\n <mi>KS</mi>\\n </mrow>\\n <annotation> ${KS}$</annotation>\\n </semantics></math>), Cramer-von Mises (<span></span><math>\\n <semantics>\\n <mrow>\\n <mi>CvM</mi>\\n </mrow>\\n <annotation> ${CvM}$</annotation>\\n </semantics></math>), Anderson Darling (<span></span><math>\\n <semantics>\\n <mrow>\\n <mi>AD</mi>\\n </mrow>\\n <annotation> ${AD}$</annotation>\\n </semantics></math>), and chi-square test — were applied to 6864 expert uncertainty estimates from 49 Classical Model (CM) studies. We compared their sensitivity to various biases and their ability to serve as performance-based weight for expert estimates. Additionally, the piecewise uniform and Metalog distribution were evaluated for their representation of expert estimates because four of the five rules require interpolating the experts' estimates. Simulating biased estimates reveals varying sensitivity of the considered test statistics to these biases. Expert weights derived using one measure of statistical accuracy were evaluated with other measures to assess their performance. The main conclusions are (1) <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>CRPS</mi>\\n </mrow>\\n <annotation> ${CRPS}$</annotation>\\n </semantics></math> overlooks important biases, while chi-square and <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>AD</mi>\\n </mrow>\\n <annotation> ${AD}$</annotation>\\n </semantics></math> behave similarly, as do <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>KS</mi>\\n </mrow>\\n <annotation> ${KS}$</annotation>\\n </semantics></math> and <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>CvM</mi>\\n </mrow>\\n <annotation> ${CvM}$</annotation>\\n </semantics></math>. (2) All measures except <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>CRPS</mi>\\n </mrow>\\n <annotation> ${CRPS}$</annotation>\\n </semantics></math> agree that performance weighting is superior to equal weighting with respect to statistical accuracy. (3) Neither distributions can effectively predict the position of a removed quantile estimate. These insights show the behavior of different scoring rules for combining uncertainty estimates from expert or models, and extent the knowledge for best-practices.</p>\",\"PeriodicalId\":100567,\"journal\":{\"name\":\"FUTURES & FORESIGHT SCIENCE\",\"volume\":\"7 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ffo2.70009\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"FUTURES & FORESIGHT SCIENCE\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ffo2.70009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"FUTURES & FORESIGHT SCIENCE","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ffo2.70009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Continuous Distributions and Measures of Statistical Accuracy for Structured Expert Judgment
This study evaluates five scoring rules, or measures of statistical accuracy, for assessing uncertainty estimates from expert judgment studies and model forecasts. These rules — the Continuously Ranked Probability Score (), Kolmogorov-Smirnov (), Cramer-von Mises (), Anderson Darling (), and chi-square test — were applied to 6864 expert uncertainty estimates from 49 Classical Model (CM) studies. We compared their sensitivity to various biases and their ability to serve as performance-based weight for expert estimates. Additionally, the piecewise uniform and Metalog distribution were evaluated for their representation of expert estimates because four of the five rules require interpolating the experts' estimates. Simulating biased estimates reveals varying sensitivity of the considered test statistics to these biases. Expert weights derived using one measure of statistical accuracy were evaluated with other measures to assess their performance. The main conclusions are (1) overlooks important biases, while chi-square and behave similarly, as do and . (2) All measures except agree that performance weighting is superior to equal weighting with respect to statistical accuracy. (3) Neither distributions can effectively predict the position of a removed quantile estimate. These insights show the behavior of different scoring rules for combining uncertainty estimates from expert or models, and extent the knowledge for best-practices.