Conformal validation: A deferral policy using uncertainty quantification with a human-in-the-loop for model validation

IF 4.9
Paul Horton, Alexandru Florea, Brandon Stringfield
{"title":"Conformal validation: A deferral policy using uncertainty quantification with a human-in-the-loop for model validation","authors":"Paul Horton,&nbsp;Alexandru Florea,&nbsp;Brandon Stringfield","doi":"10.1016/j.mlwa.2025.100733","DOIUrl":null,"url":null,"abstract":"<div><div>Validating performance is a key challenge facing the adoption of machine learning models in high risk applications. Current validation methods assess performance marginally over the entire testing dataset, which can fail to identify regions in the distribution with insufficient performance. In this paper, we propose Conformal Validation, a systems-based approach with a calibrated form of uncertainty quantification using a conformal prediction framework as a part of the validation process to reduce performance gaps. Specifically, the policy defers a subset of observations for which the predictive model is most uncertain and provides a human with informative prediction sets to make the ancillary decision. We evaluate this policy on an image classification task where images are distorted with varying levels of gaussian blur for a quantifiable measure of added difficulty. The model is compared to human performance on the most difficult observations, i.e., those where the model is most uncertain, to simulate the scenario when a human is the alternative decision-maker. We evaluate performance on three arms: the model independently, humans with access to a set of classes the model is most confident in, and humans independently. The deferral policy is simple to understand, applicable to any predictive model, and easy to implement while, in this case, keeping humans in the loop for improved trustworthiness. Conformal Validation incorporates a risk assessment that is conditioned on the prediction set length and can be tuned to the needs of the application.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100733"},"PeriodicalIF":4.9000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Validating performance is a key challenge facing the adoption of machine learning models in high risk applications. Current validation methods assess performance marginally over the entire testing dataset, which can fail to identify regions in the distribution with insufficient performance. In this paper, we propose Conformal Validation, a systems-based approach with a calibrated form of uncertainty quantification using a conformal prediction framework as a part of the validation process to reduce performance gaps. Specifically, the policy defers a subset of observations for which the predictive model is most uncertain and provides a human with informative prediction sets to make the ancillary decision. We evaluate this policy on an image classification task where images are distorted with varying levels of gaussian blur for a quantifiable measure of added difficulty. The model is compared to human performance on the most difficult observations, i.e., those where the model is most uncertain, to simulate the scenario when a human is the alternative decision-maker. We evaluate performance on three arms: the model independently, humans with access to a set of classes the model is most confident in, and humans independently. The deferral policy is simple to understand, applicable to any predictive model, and easy to implement while, in this case, keeping humans in the loop for improved trustworthiness. Conformal Validation incorporates a risk assessment that is conditioned on the prediction set length and can be tuned to the needs of the application.
保形验证:一种使用不确定性量化和人在环模型验证的延迟策略
验证性能是在高风险应用中采用机器学习模型所面临的关键挑战。当前的验证方法在整个测试数据集上略微评估性能,这可能无法识别分布中性能不足的区域。在本文中,我们提出了保形验证,这是一种基于系统的方法,具有校准形式的不确定性量化,使用保形预测框架作为验证过程的一部分,以减少性能差距。具体来说,该策略延迟了预测模型最不确定的观测子集,并为人类提供了信息丰富的预测集,以做出辅助决策。我们在一个图像分类任务上评估这个策略,其中图像被不同程度的高斯模糊扭曲,以衡量增加的难度。将该模型与人类在最困难的观察(即模型最不确定的观察)中的表现进行比较,以模拟人类作为替代决策者的情景。我们在三个方面评估性能:独立的模型,可以访问模型最自信的一组类的人,以及独立的人。延迟策略易于理解,适用于任何预测模型,并且易于实现,同时在这种情况下,将人类留在循环中以提高可信度。适形验证结合了一个风险评估,该评估以预测集长度为条件,并且可以根据应用程序的需要进行调整。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine learning with applications
Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
98 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信