再论低维数据中完全线性回归模型和选定线性回归模型的估计后收缩率

IF 1.3 3区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biometrical Journal Pub Date : 2024-09-27 DOI:10.1002/bimj.202300368

Edwin Kipruto, Willi Sauerbrei

{"title":"再论低维数据中完全线性回归模型和选定线性回归模型的估计后收缩率","authors":"Edwin Kipruto, Willi Sauerbrei","doi":"10.1002/bimj.202300368","DOIUrl":null,"url":null,"abstract":"<p>The fit of a regression model to new data is often worse due to overfitting. Analysts use variable selection techniques to develop parsimonious regression models, which may introduce bias into regression estimates. Shrinkage methods have been proposed to mitigate overfitting and reduce bias in estimates. Post-estimation shrinkage is an alternative to penalized methods. This study evaluates effectiveness of post-estimation shrinkage in improving prediction performance of full and selected models. Through a simulation study, results were compared with ordinary least squares (OLS) and ridge in full models, and best subset selection (BSS) and lasso in selected models. We focused on prediction errors and the number of selected variables. Additionally, we proposed a modified version of the parameter-wise shrinkage (PWS) approach named non-negative PWS (NPWS) to address weaknesses of PWS. Results showed that no method was superior in all scenarios. In full models, NPWS outperformed global shrinkage, whereas PWS was inferior to OLS. In low correlation with moderate-to-high signal-to-noise ratio (SNR), NPWS outperformed ridge, but ridge performed best in small sample sizes, high correlation, and low SNR. In selected models, all post-estimation shrinkage performed similarly, with global shrinkage slightly inferior. Lasso outperformed BSS and post-estimation shrinkage in small sample sizes, low SNR, and high correlation but was inferior when the opposite was true. Our study suggests that, with sufficient information, NPWS is more effective than global shrinkage in improving prediction accuracy of models. However, in high correlation, small sample sizes, and low SNR, penalized methods generally outperform post-estimation shrinkage methods.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 7","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300368","citationCount":"0","resultStr":"{\"title\":\"Post-Estimation Shrinkage in Full and Selected Linear Regression Models in Low-Dimensional Data Revisited\",\"authors\":\"Edwin Kipruto, Willi Sauerbrei\",\"doi\":\"10.1002/bimj.202300368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The fit of a regression model to new data is often worse due to overfitting. Analysts use variable selection techniques to develop parsimonious regression models, which may introduce bias into regression estimates. Shrinkage methods have been proposed to mitigate overfitting and reduce bias in estimates. Post-estimation shrinkage is an alternative to penalized methods. This study evaluates effectiveness of post-estimation shrinkage in improving prediction performance of full and selected models. Through a simulation study, results were compared with ordinary least squares (OLS) and ridge in full models, and best subset selection (BSS) and lasso in selected models. We focused on prediction errors and the number of selected variables. Additionally, we proposed a modified version of the parameter-wise shrinkage (PWS) approach named non-negative PWS (NPWS) to address weaknesses of PWS. Results showed that no method was superior in all scenarios. In full models, NPWS outperformed global shrinkage, whereas PWS was inferior to OLS. In low correlation with moderate-to-high signal-to-noise ratio (SNR), NPWS outperformed ridge, but ridge performed best in small sample sizes, high correlation, and low SNR. In selected models, all post-estimation shrinkage performed similarly, with global shrinkage slightly inferior. Lasso outperformed BSS and post-estimation shrinkage in small sample sizes, low SNR, and high correlation but was inferior when the opposite was true. Our study suggests that, with sufficient information, NPWS is more effective than global shrinkage in improving prediction accuracy of models. However, in high correlation, small sample sizes, and low SNR, penalized methods generally outperform post-estimation shrinkage methods.</p>\",\"PeriodicalId\":55360,\"journal\":{\"name\":\"Biometrical Journal\",\"volume\":\"66 7\",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300368\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrical Journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/bimj.202300368\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrical Journal","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bimj.202300368","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

由于过度拟合，回归模型与新数据的拟合效果往往会变差。分析师使用变量选择技术来建立简洁的回归模型，这可能会给回归估算带来偏差。有人提出了收缩方法来缓解过度拟合，减少估计值的偏差。估计后收缩法是惩罚法的一种替代方法。本研究评估了估计后缩减法在提高完整模型和选定模型预测性能方面的有效性。通过模拟研究，将结果与完整模型中的普通最小二乘法（OLS）和岭法，以及选定模型中的最佳子集选择法（BSS）和套索法进行了比较。我们重点关注了预测误差和所选变量的数量。此外，我们还针对 PWS 的弱点，提出了一种名为非负 PWS（NPWS）的参数明智收缩（PWS）方法的改进版。结果表明，在所有情况下，没有一种方法更胜一筹。在完整模型中，NPWS 优于全局收缩法，而 PWS 则逊于 OLS。在低相关性和中高信噪比（SNR）的情况下，NPWS 的表现优于 Ridge，但 Ridge 在样本量小、高相关性和低信噪比的情况下表现最好。在选定的模型中，所有后估计缩减法的表现相似，全局缩减法略逊一筹。在样本量小、信噪比低和相关性高的情况下，Lasso 的表现优于 BSS 和估计后收缩法，但在相反的情况下，Lasso 的表现则较差。我们的研究表明，在信息充足的情况下，NPWS 在提高模型预测准确性方面比全局收缩更有效。然而，在高相关性、小样本量和低信噪比的情况下，惩罚法通常优于估计后收缩法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Post-Estimation Shrinkage in Full and Selected Linear Regression Models in Low-Dimensional Data Revisited

查看原文本刊更多论文

Post-Estimation Shrinkage in Full and Selected Linear Regression Models in Low-Dimensional Data Revisited

The fit of a regression model to new data is often worse due to overfitting. Analysts use variable selection techniques to develop parsimonious regression models, which may introduce bias into regression estimates. Shrinkage methods have been proposed to mitigate overfitting and reduce bias in estimates. Post-estimation shrinkage is an alternative to penalized methods. This study evaluates effectiveness of post-estimation shrinkage in improving prediction performance of full and selected models. Through a simulation study, results were compared with ordinary least squares (OLS) and ridge in full models, and best subset selection (BSS) and lasso in selected models. We focused on prediction errors and the number of selected variables. Additionally, we proposed a modified version of the parameter-wise shrinkage (PWS) approach named non-negative PWS (NPWS) to address weaknesses of PWS. Results showed that no method was superior in all scenarios. In full models, NPWS outperformed global shrinkage, whereas PWS was inferior to OLS. In low correlation with moderate-to-high signal-to-noise ratio (SNR), NPWS outperformed ridge, but ridge performed best in small sample sizes, high correlation, and low SNR. In selected models, all post-estimation shrinkage performed similarly, with global shrinkage slightly inferior. Lasso outperformed BSS and post-estimation shrinkage in small sample sizes, low SNR, and high correlation but was inferior when the opposite was true. Our study suggests that, with sufficient information, NPWS is more effective than global shrinkage in improving prediction accuracy of models. However, in high correlation, small sample sizes, and low SNR, penalized methods generally outperform post-estimation shrinkage methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biometrical Journal 生物-数学与计算生物学

CiteScore

3.20

自引率

5.90%

发文量

119

审稿时长

6-12 weeks

期刊介绍： Biometrical Journal publishes papers on statistical methods and their applications in life sciences including medicine, environmental sciences and agriculture. Methodological developments should be motivated by an interesting and relevant problem from these areas. Ideally the manuscript should include a description of the problem and a section detailing the application of the new methodology to the problem. Case studies, review articles and letters to the editors are also welcome. Papers containing only extensive mathematical theory are not suitable for publication in Biometrical Journal.