Conformal validation: A deferral policy using uncertainty quantification with a human-in-the-loop for model validation

IF 4.9

Machine learning with applications Pub Date : 2025-09-15 DOI:10.1016/j.mlwa.2025.100733

Paul Horton, Alexandru Florea, Brandon Stringfield

{"title":"Conformal validation: A deferral policy using uncertainty quantification with a human-in-the-loop for model validation","authors":"Paul Horton, Alexandru Florea, Brandon Stringfield","doi":"10.1016/j.mlwa.2025.100733","DOIUrl":null,"url":null,"abstract":"<div><div>Validating performance is a key challenge facing the adoption of machine learning models in high risk applications. Current validation methods assess performance marginally over the entire testing dataset, which can fail to identify regions in the distribution with insufficient performance. In this paper, we propose Conformal Validation, a systems-based approach with a calibrated form of uncertainty quantification using a conformal prediction framework as a part of the validation process to reduce performance gaps. Specifically, the policy defers a subset of observations for which the predictive model is most uncertain and provides a human with informative prediction sets to make the ancillary decision. We evaluate this policy on an image classification task where images are distorted with varying levels of gaussian blur for a quantifiable measure of added difficulty. The model is compared to human performance on the most difficult observations, i.e., those where the model is most uncertain, to simulate the scenario when a human is the alternative decision-maker. We evaluate performance on three arms: the model independently, humans with access to a set of classes the model is most confident in, and humans independently. The deferral policy is simple to understand, applicable to any predictive model, and easy to implement while, in this case, keeping humans in the loop for improved trustworthiness. Conformal Validation incorporates a risk assessment that is conditioned on the prediction set length and can be tuned to the needs of the application.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100733"},"PeriodicalIF":4.9000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Validating performance is a key challenge facing the adoption of machine learning models in high risk applications. Current validation methods assess performance marginally over the entire testing dataset, which can fail to identify regions in the distribution with insufficient performance. In this paper, we propose Conformal Validation, a systems-based approach with a calibrated form of uncertainty quantification using a conformal prediction framework as a part of the validation process to reduce performance gaps. Specifically, the policy defers a subset of observations for which the predictive model is most uncertain and provides a human with informative prediction sets to make the ancillary decision. We evaluate this policy on an image classification task where images are distorted with varying levels of gaussian blur for a quantifiable measure of added difficulty. The model is compared to human performance on the most difficult observations, i.e., those where the model is most uncertain, to simulate the scenario when a human is the alternative decision-maker. We evaluate performance on three arms: the model independently, humans with access to a set of classes the model is most confident in, and humans independently. The deferral policy is simple to understand, applicable to any predictive model, and easy to implement while, in this case, keeping humans in the loop for improved trustworthiness. Conformal Validation incorporates a risk assessment that is conditioned on the prediction set length and can be tuned to the needs of the application.

查看原文本刊更多论文

保形验证：一种使用不确定性量化和人在环模型验证的延迟策略

验证性能是在高风险应用中采用机器学习模型所面临的关键挑战。当前的验证方法在整个测试数据集上略微评估性能，这可能无法识别分布中性能不足的区域。在本文中，我们提出了保形验证，这是一种基于系统的方法，具有校准形式的不确定性量化，使用保形预测框架作为验证过程的一部分，以减少性能差距。具体来说，该策略延迟了预测模型最不确定的观测子集，并为人类提供了信息丰富的预测集，以做出辅助决策。我们在一个图像分类任务上评估这个策略，其中图像被不同程度的高斯模糊扭曲，以衡量增加的难度。将该模型与人类在最困难的观察（即模型最不确定的观察）中的表现进行比较，以模拟人类作为替代决策者的情景。我们在三个方面评估性能：独立的模型，可以访问模型最自信的一组类的人，以及独立的人。延迟策略易于理解，适用于任何预测模型，并且易于实现，同时在这种情况下，将人类留在循环中以提高可信度。适形验证结合了一个风险评估，该评估以预测集长度为条件，并且可以根据应用程序的需要进行调整。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days