错误的估计，正确的预测：密集状态下的套索

IF 2.1 4区数学 Q1 STATISTICS & PROBABILITY

American Statistician Pub Date : 2025-10-08 DOI:10.1080/00031305.2025.2569464

Andrea Bratsberg, Magne Thoresen, Jelle J. Goeman

{"title":"错误的估计，正确的预测：密集状态下的套索","authors":"Andrea Bratsberg, Magne Thoresen, Jelle J. Goeman","doi":"10.1080/00031305.2025.2569464","DOIUrl":null,"url":null,"abstract":"For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"22 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bad estimation, good prediction: the Lasso in dense regimes\",\"authors\":\"Andrea Bratsberg, Magne Thoresen, Jelle J. Goeman\",\"doi\":\"10.1080/00031305.2025.2569464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.\",\"PeriodicalId\":50801,\"journal\":{\"name\":\"American Statistician\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Statistician\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1080/00031305.2025.2569464\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Statistician","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/00031305.2025.2569464","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

对于高维组学数据，Lasso等稀疏性诱导正则化方法被广泛使用，并且经常产生很强的预测性能，即使在可能违反稀疏性假设的情况下也是如此。我们证明了在特定的密集模型下，即高维联合隐变量模型下，即使底层回归系数向量根本不稀疏，Lasso也能产生具有良好预测误差界的稀疏预测规则。我们进一步认为，该模型比稀疏线性回归模型更好地代表了许多类型的组学数据。我们证明了该模型下的预测界实际上随着预测者数量的增加而减小，并通过仿真实例证实了这一点。这些结果强调了在解释稀疏预测规则时需要谨慎，因为稀疏预测规则的高预测精度可能并不意味着单个预测因子的潜在生物学意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bad estimation, good prediction: the Lasso in dense regimes

For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

American Statistician 数学-统计学与概率论

CiteScore

3.50

自引率

5.60%

发文量

审稿时长

>12 weeks

期刊介绍： Are you looking for general-interest articles about current national and international statistical problems and programs; interesting and fun articles of a general nature about statistics and its applications; or the teaching of statistics? Then you are looking for The American Statistician (TAS), published quarterly by the American Statistical Association. TAS contains timely articles organized into the following sections: Statistical Practice, General, Teacher''s Corner, History Corner, Interdisciplinary, Statistical Computing and Graphics, Reviews of Books and Teaching Materials, and Letters to the Editor.