Investigating Normalized Conformal Regressors

U. Johansson, Henrik Boström, Tuwe Löfström
{"title":"Investigating Normalized Conformal Regressors","authors":"U. Johansson, Henrik Boström, Tuwe Löfström","doi":"10.1109/SSCI50451.2021.9659853","DOIUrl":null,"url":null,"abstract":"Conformal prediction can be applied on top of any machine learning predictive regression model, thus turning it into a conformal regressor. Given a significance level $\\epsilon$, conformal regressors output valid prediction intervals, i.e., the probability that the interval covers the true value is exactly $1-\\epsilon$. To obtain validity, a calibration set that is not used for training the model must be set aside. In standard inductive conformal regression, the size of the prediction intervals is then determined by the absolute error made by the predictive model on a specific instance in the calibration set, where different significance levels correspond to different instances. In this setting, all prediction intervals will have the same size, making the resulting models very unspecific. When adding a technique called normalization, however, the difficulty of each instance is estimated, and the interval sizes are adjusted accordingly. An integral part of normalized conformal regressors is a parameter called $\\beta$, which determines the relative importance of the difficulty estimation and the error of the model. In this study, the effects of different underlying models, difficulty estimation functions and $\\beta$ -values are investigated. The results from a large empirical study, using twenty publicly available data sets, show that better difficulty estimation functions will lead to both tighter and more specific prediction intervals. Furthermore, it is found that the $\\beta$ -values used strongly affect the conformal regressor. While there is no specific $\\beta$ -value that will always minimize the interval sizes, lower $\\beta$ -values lead to more variation in the interval sizes, i.e., more specific models. In addition, the analysis also identifies that the normalization procedure introduces a small but unfortunate bias in the models. More specifically, normalization using low $\\beta$ -values means that smaller intervals are more likely to be erroneous, while the opposite is true for higher $\\beta$ -values.","PeriodicalId":255763,"journal":{"name":"2021 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI50451.2021.9659853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Conformal prediction can be applied on top of any machine learning predictive regression model, thus turning it into a conformal regressor. Given a significance level $\epsilon$, conformal regressors output valid prediction intervals, i.e., the probability that the interval covers the true value is exactly $1-\epsilon$. To obtain validity, a calibration set that is not used for training the model must be set aside. In standard inductive conformal regression, the size of the prediction intervals is then determined by the absolute error made by the predictive model on a specific instance in the calibration set, where different significance levels correspond to different instances. In this setting, all prediction intervals will have the same size, making the resulting models very unspecific. When adding a technique called normalization, however, the difficulty of each instance is estimated, and the interval sizes are adjusted accordingly. An integral part of normalized conformal regressors is a parameter called $\beta$, which determines the relative importance of the difficulty estimation and the error of the model. In this study, the effects of different underlying models, difficulty estimation functions and $\beta$ -values are investigated. The results from a large empirical study, using twenty publicly available data sets, show that better difficulty estimation functions will lead to both tighter and more specific prediction intervals. Furthermore, it is found that the $\beta$ -values used strongly affect the conformal regressor. While there is no specific $\beta$ -value that will always minimize the interval sizes, lower $\beta$ -values lead to more variation in the interval sizes, i.e., more specific models. In addition, the analysis also identifies that the normalization procedure introduces a small but unfortunate bias in the models. More specifically, normalization using low $\beta$ -values means that smaller intervals are more likely to be erroneous, while the opposite is true for higher $\beta$ -values.
研究归一化共形回归量
保形预测可以应用于任何机器学习预测回归模型之上,从而将其转化为保形回归量。给定显著性水平$\epsilon$,共形回归器输出有效的预测区间,即该区间恰好覆盖真实值的概率为$1-\epsilon$。为了获得有效性,必须将不用于训练模型的校准集放在一边。在标准归纳共形回归中,预测区间的大小由预测模型对校准集中特定实例的绝对误差决定,其中不同的显著性水平对应于不同的实例。在此设置中,所有预测间隔将具有相同的大小,从而使结果模型非常不具体。然而,当添加一种称为归一化的技术时,对每个实例的难度进行估计,并相应地调整区间大小。归一化共形回归量的一个组成部分是一个名为$\beta$的参数,它决定了难度估计和模型误差的相对重要性。在本研究中,研究了不同的底层模型、难度估计函数和$\beta$ -值的影响。使用20个公开数据集的大型实证研究结果表明,更好的难度估计函数将导致更严格和更具体的预测区间。此外,发现$\beta$ -值使用强烈影响保形回归量。虽然没有特定的$\beta$ -值总是使区间大小最小化,但较低的$\beta$ -值会导致区间大小的更多变化,即更具体的模型。此外,分析还指出,归一化过程在模型中引入了一个小但不幸的偏差。更具体地说,使用较低的$\beta$ -值进行规范化意味着较小的间隔更有可能出错,而较高的$\beta$ -值则相反。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信