Evaluating probabilistic classifiers: The triptych

IF 6.9 2区 经济学 Q1 ECONOMICS
Timo Dimitriadis , Tilmann Gneiting , Alexander I. Jordan , Peter Vogel
{"title":"Evaluating probabilistic classifiers: The triptych","authors":"Timo Dimitriadis ,&nbsp;Tilmann Gneiting ,&nbsp;Alexander I. Jordan ,&nbsp;Peter Vogel","doi":"10.1016/j.ijforecast.2023.09.007","DOIUrl":null,"url":null,"abstract":"<div><p>Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics focusing on distinct and complementary aspects of forecast performance: Reliability curves address calibration, receiver operating characteristic (ROC) curves diagnose discrimination ability, and Murphy curves visualize overall predictive performance and value. A Murphy curve shows a forecast’s mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm-based) approach to craft reliability curves and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the <span><math><mtext>DSC</mtext></math></span> measure of discrimination ability versus the calibration metric <span><math><mtext>MCB</mtext></math></span> visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science.</p></div>","PeriodicalId":14061,"journal":{"name":"International Journal of Forecasting","volume":null,"pages":null},"PeriodicalIF":6.9000,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169207023000997/pdfft?md5=bd26faa9dd0165399770a39be8802f6a&pid=1-s2.0-S0169207023000997-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Forecasting","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169207023000997","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics focusing on distinct and complementary aspects of forecast performance: Reliability curves address calibration, receiver operating characteristic (ROC) curves diagnose discrimination ability, and Murphy curves visualize overall predictive performance and value. A Murphy curve shows a forecast’s mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm-based) approach to craft reliability curves and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the DSC measure of discrimination ability versus the calibration metric MCB visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science.

评估概率分类器:三部曲
二元结果的概率预测,通常被称为概率分类器或置信度分数,在科学和社会中无处不在,而评估和比较它们的方法需求量很大。我们提出并研究了一种诊断图形的三要素,其重点是预测性能的不同方面和互补方面:可靠性曲线解决校准问题,接收者操作特征曲线(ROC)诊断辨别能力,墨菲曲线直观显示整体预测性能和价值。墨菲曲线显示预测的平均基本分数,包括广泛使用的误分类率,墨菲曲线下的面积等于平均布赖尔分数。对于经过校准的预测,可靠性曲线位于对角线上,对于经过校准的竞争预测,ROC 曲线和墨菲曲线的交叉点数量相同。我们采用最近开发的 CORP(基于算法的一致性、最佳分档、可重复性和池相邻违规者(PAV))方法来制作可靠性曲线,并将平均得分分解为误判(MCB)、判别(DSC)和不确定性(UNC)三个部分。辨别能力的 DSC 指标与校准指标 MCB 的对比图直观地显示了分类器在多个竞争对手中的表现。天体物理学、经济学和社会科学领域的经验实例对所提出的工具进行了说明。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
17.10
自引率
11.40%
发文量
189
审稿时长
77 days
期刊介绍: The International Journal of Forecasting is a leading journal in its field that publishes high quality refereed papers. It aims to bridge the gap between theory and practice, making forecasting useful and relevant for decision and policy makers. The journal places strong emphasis on empirical studies, evaluation activities, implementation research, and improving the practice of forecasting. It welcomes various points of view and encourages debate to find solutions to field-related problems. The journal is the official publication of the International Institute of Forecasters (IIF) and is indexed in Sociological Abstracts, Journal of Economic Literature, Statistical Theory and Method Abstracts, INSPEC, Current Contents, UMI Data Courier, RePEc, Academic Journal Guide, CIS, IAOR, and Social Sciences Citation Index.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信