{"title":"错误的估计,正确的预测:密集状态下的套索","authors":"Andrea Bratsberg, Magne Thoresen, Jelle J. Goeman","doi":"10.1080/00031305.2025.2569464","DOIUrl":null,"url":null,"abstract":"For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"22 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bad estimation, good prediction: the Lasso in dense regimes\",\"authors\":\"Andrea Bratsberg, Magne Thoresen, Jelle J. Goeman\",\"doi\":\"10.1080/00031305.2025.2569464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.\",\"PeriodicalId\":50801,\"journal\":{\"name\":\"American Statistician\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Statistician\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1080/00031305.2025.2569464\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Statistician","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/00031305.2025.2569464","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Bad estimation, good prediction: the Lasso in dense regimes
For high-dimensional omics data, sparsity-inducing regularization methods such as the Lasso are widely used and often yield strong predictive performance, even in settings when the assumption of sparsity is likely violated. We demonstrate that under a specific dense model, namely the high-dimensional joint latent variable model, the Lasso produces sparse prediction rules with favorable prediction error bounds, even when the underlying regression coefficient vector is not sparse at all. We further argue that this model better represents many types of omics data than sparse linear regression models. We prove that the prediction bound under this model in fact decreases with increasing number of predictors, and confirm this through simulation examples. These results highlight the need for caution when interpreting sparse prediction rules, as strong prediction accuracy of a sparse prediction rule may not imply underlying biological significance of the individual predictors.
期刊介绍:
Are you looking for general-interest articles about current national and international statistical problems and programs; interesting and fun articles of a general nature about statistics and its applications; or the teaching of statistics? Then you are looking for The American Statistician (TAS), published quarterly by the American Statistical Association. TAS contains timely articles organized into the following sections: Statistical Practice, General, Teacher''s Corner, History Corner, Interdisciplinary, Statistical Computing and Graphics, Reviews of Books and Teaching Materials, and Letters to the Editor.