Conformalization of Sparse Generalized Linear Models

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning Pub Date : 2023-07-11 DOI:10.48550/arXiv.2307.05109

E. Guha, Eugène Ndiaye, X. Huo

{"title":"Conformalization of Sparse Generalized Linear Models","authors":"E. Guha, Eugène Ndiaye, X. Huo","doi":"10.48550/arXiv.2307.05109","DOIUrl":null,"url":null,"abstract":"Given a sequence of observable variables $\\{(x_1, y_1), \\ldots, (x_n, y_n)\\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"5 1","pages":"11871-11887"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.05109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.

查看原文本刊更多论文

稀疏广义线性模型的保形化

给定一系列可观测变量$\{(x_1, y_1)， \ldots， (x_n, y_n)\}$，保形预测方法估计了$y_{n+1}$给定$x_{n+1}$的置信集，该置信集仅假设数据的联合分布是排列不变的，对任何有限样本容量有效。虽然很有吸引力，但在大多数回归问题中计算这样一个集合在计算上是不可行的。实际上，在这些情况下，未知变量$y_{n+1}$可以取无限个可能的候选值，而生成保形集需要为每个候选值重新训练一个预测模型。在本文中，我们关注一个只有一组变量用于预测的稀疏线性模型，并使用数值延拓技术有效地逼近解路径。我们利用的关键性质是，在输入数据的小扰动下，所选变量的集合是不变的。因此，仅在活动特征集的变化点处枚举和重构模型并通过Predictor-Corrector机制平滑地插值其余的解决方案就足够了。我们展示了我们的路径跟踪算法如何准确地逼近保形预测集，并使用合成和真实数据示例说明其性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

自引率

0.00%

发文量