{"title":"加倍稳健、计算高效的高维变量选择","authors":"Abhinav Chakraborty, Jeffrey Zhang, Eugene Katsevich","doi":"arxiv-2409.09512","DOIUrl":null,"url":null,"abstract":"The variable selection problem is to discover which of a large set of\npredictors is associated with an outcome of interest, conditionally on the\nother predictors. This problem has been widely studied, but existing approaches\nlack either power against complex alternatives, robustness to model\nmisspecification, computational efficiency, or quantification of evidence\nagainst individual hypotheses. We present tower PCM (tPCM), a statistically and\ncomputationally efficient solution to the variable selection problem that does\nnot suffer from these shortcomings. tPCM adapts the best aspects of two\nexisting procedures that are based on similar functionals: the holdout\nrandomization test (HRT) and the projected covariance measure (PCM). The former\nis a model-X test that utilizes many resamples and few machine learning fits,\nwhile the latter is an asymptotic doubly-robust style test for a single\nhypothesis that requires no resamples and many machine learning fits.\nTheoretically, we demonstrate the validity of tPCM, and perhaps surprisingly,\nthe asymptotic equivalence of HRT, PCM, and tPCM. In so doing, we clarify the\nrelationship between two methods from two separate literatures. An extensive\nsimulation study verifies that tPCM can have significant computational savings\ncompared to HRT and PCM, while maintaining nearly identical power.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Doubly robust and computationally efficient high-dimensional variable selection\",\"authors\":\"Abhinav Chakraborty, Jeffrey Zhang, Eugene Katsevich\",\"doi\":\"arxiv-2409.09512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The variable selection problem is to discover which of a large set of\\npredictors is associated with an outcome of interest, conditionally on the\\nother predictors. This problem has been widely studied, but existing approaches\\nlack either power against complex alternatives, robustness to model\\nmisspecification, computational efficiency, or quantification of evidence\\nagainst individual hypotheses. We present tower PCM (tPCM), a statistically and\\ncomputationally efficient solution to the variable selection problem that does\\nnot suffer from these shortcomings. tPCM adapts the best aspects of two\\nexisting procedures that are based on similar functionals: the holdout\\nrandomization test (HRT) and the projected covariance measure (PCM). The former\\nis a model-X test that utilizes many resamples and few machine learning fits,\\nwhile the latter is an asymptotic doubly-robust style test for a single\\nhypothesis that requires no resamples and many machine learning fits.\\nTheoretically, we demonstrate the validity of tPCM, and perhaps surprisingly,\\nthe asymptotic equivalence of HRT, PCM, and tPCM. In so doing, we clarify the\\nrelationship between two methods from two separate literatures. An extensive\\nsimulation study verifies that tPCM can have significant computational savings\\ncompared to HRT and PCM, while maintaining nearly identical power.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09512\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Doubly robust and computationally efficient high-dimensional variable selection
The variable selection problem is to discover which of a large set of
predictors is associated with an outcome of interest, conditionally on the
other predictors. This problem has been widely studied, but existing approaches
lack either power against complex alternatives, robustness to model
misspecification, computational efficiency, or quantification of evidence
against individual hypotheses. We present tower PCM (tPCM), a statistically and
computationally efficient solution to the variable selection problem that does
not suffer from these shortcomings. tPCM adapts the best aspects of two
existing procedures that are based on similar functionals: the holdout
randomization test (HRT) and the projected covariance measure (PCM). The former
is a model-X test that utilizes many resamples and few machine learning fits,
while the latter is an asymptotic doubly-robust style test for a single
hypothesis that requires no resamples and many machine learning fits.
Theoretically, we demonstrate the validity of tPCM, and perhaps surprisingly,
the asymptotic equivalence of HRT, PCM, and tPCM. In so doing, we clarify the
relationship between two methods from two separate literatures. An extensive
simulation study verifies that tPCM can have significant computational savings
compared to HRT and PCM, while maintaining nearly identical power.