Leveraging external information by guided adaptive shrinkage to improve variable selection in high-dimensional regression settings.

IF 1.2 4区数学

International Journal of Biostatistics Pub Date : 2025-09-08 DOI:10.1515/ijb-2024-0108

Mark A van de Wiel, Wessel N van Wieringen

{"title":"Leveraging external information by guided adaptive shrinkage to improve variable selection in high-dimensional regression settings.","authors":"Mark A van de Wiel, Wessel N van Wieringen","doi":"10.1515/ijb-2024-0108","DOIUrl":null,"url":null,"abstract":"Variable selection is challenging for high-dimensional data, in particular when sample size is low. It is widely recognized that external information in the form of complementary data on the variables, 'co-data', may improve results. Examples are known variable groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories, and is likely equally relevant for other applications. Yet, the uptake of prediction methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating variable selection. Moreover, we demonstrate the versatility of the guided shrinkage methodology by showing how to 'do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving variable selection in genetics studies. We conclude with a real data example.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/ijb-2024-0108","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Variable selection is challenging for high-dimensional data, in particular when sample size is low. It is widely recognized that external information in the form of complementary data on the variables, 'co-data', may improve results. Examples are known variable groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories, and is likely equally relevant for other applications. Yet, the uptake of prediction methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating variable selection. Moreover, we demonstrate the versatility of the guided shrinkage methodology by showing how to 'do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving variable selection in genetics studies. We conclude with a real data example.

查看原文本刊更多论文

利用外部信息引导自适应收缩，以提高变量选择在高维回归设置。

对于高维数据，特别是当样本量较低时，变量选择是具有挑战性的。人们普遍认为，有关变量的补充数据形式的外部信息，即“协数据”，可能会改善结果。例如，相关研究中的已知变量组或p值。由于公共存储库的可用性，这种协同数据在基因组学设置中无处不在，并且可能与其他应用程序同样相关。然而，在结构上使用这种协同数据的预测方法的吸收是有限的。我们回顾了引导自适应收缩方法：一类基于回归的学习器，它使用协数据来适应收缩参数，这对这些学习器的性能至关重要。我们讨论了技术方面的问题，但也讨论了可处理的协同数据类型的适用性。这类方法与其他几种方法作了对比。特别是，通过评估变量选择，将群体自适应收缩与更著名的稀疏群体lasso进行比较。此外，我们通过展示如何“自己动手”来展示引导收缩方法的多功能性：我们整合了共同数据学习器的实现和尖钉-板先验，以改善遗传学研究中的变量选择。我们以一个真实的数据示例作为总结。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Biostatistics Mathematics-Statistics and Probability

CiteScore

2.30

自引率

8.30%

发文量

期刊介绍： The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.