Out-of-sample R2: estimation and inference

The American Statistician Pub Date : 2023-02-10 DOI:10.1080/00031305.2023.2216252

Stijn Hawinkel, W. Waegeman, Steven Maere

{"title":"Out-of-sample R2: estimation and inference","authors":"Stijn Hawinkel, W. Waegeman, Steven Maere","doi":"10.1080/00031305.2023.2216252","DOIUrl":null,"url":null,"abstract":"Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample $R^2$, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample $R^2$, the out-of-sample $R^2$ has not been well defined and the variability on the out-of-sample $\\hat{R}^2$ has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample $R^2$ as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $\\hat{R}^2$. The performance of the estimators for the $R^2$ and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative $\\text{Brassica napus}$ and $\\text{Zea mays}$ phenotypes based on gene expression data.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The American Statistician","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/00031305.2023.2216252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample $R^2$, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample $R^2$, the out-of-sample $R^2$ has not been well defined and the variability on the out-of-sample $\hat{R}^2$ has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample $R^2$ as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $\hat{R}^2$. The performance of the estimators for the $R^2$ and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative $\text{Brassica napus}$ and $\text{Zea mays}$ phenotypes based on gene expression data.

查看原文本刊更多论文

样本外R2:估计和推断

样本外预测是预测模型的严格检验，但通常没有独立的测试数据集来评估预测误差。出于这个原因，通常使用数据分割算法(如交叉验证或自举)来估计样本外性能。对于定量结果，解释的方差与总方差的比率可以用决定系数或样本内R^2来概括，这很容易解释和比较不同结果变量。与样本内$R^2$相反，样本外$R^2$没有得到很好的定义，并且样本外$\hat{R}^2$的可变性在很大程度上被忽略了。通常只报告其点估计，妨碍了对不同结果变量的可预测性进行正式比较。在这里，我们明确地将样本外$R^2$定义为两个预测模型的比较，提供了一个无偏估计量，并利用最近关于数据分割估计不确定性的理论进展，为$\hat{R}^2$提供了一个标准误差。仿真研究了R^2估计器的性能及其标准误差。我们通过构建置信区间和比较基于基因表达数据的定量预测$\text{油菜}$和$\text{玉米}$表型的模型来证明我们的新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The American Statistician

自引率

0.00%

发文量