Robust angle-based transfer learning in high dimensions.

IF 3.1 1区数学 Q1 STATISTICS & PROBABILITY

Journal of the Royal Statistical Society Series B-Statistical Methodology Pub Date : 2024-12-03 eCollection Date: 2025-07-01 DOI:10.1093/jrsssb/qkae111

Tian Gu, Yi Han, Rui Duan

{"title":"Robust angle-based transfer learning in high dimensions.","authors":"Tian Gu, Yi Han, Rui Duan","doi":"10.1093/jrsssb/qkae111","DOIUrl":null,"url":null,"abstract":"<p><p>Transfer learning improves target model performance by leveraging data from related source populations, especially when target data are scarce. This study addresses the challenge of training high-dimensional regression models with limited target data in the presence of heterogeneous source populations. We focus on a practical setting where only parameter estimates of pretrained source models are available, rather than individual-level source data. For a single source model, we propose a novel angle-based transfer learning (angleTL) method that leverages concordance between source and target model parameters. AngleTL adapts to the signal strength of the target model, unifies several benchmark methods, and mitigates negative transfer when between-population heterogeneity is large. We extend angleTL to incorporate multiple source models, accounting for varying levels of relevance among them. Our high-dimensional asymptotic analysis provides insights into when a source model benefits the target model and demonstrates the superiority of angleTL over other methods. Extensive simulations validate these findings and highlight the feasibility of applying angleTL to transfer genetic risk prediction models across multiple biobanks.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"87 3","pages":"723-745"},"PeriodicalIF":3.1000,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12256125/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society Series B-Statistical Methodology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jrsssb/qkae111","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Transfer learning improves target model performance by leveraging data from related source populations, especially when target data are scarce. This study addresses the challenge of training high-dimensional regression models with limited target data in the presence of heterogeneous source populations. We focus on a practical setting where only parameter estimates of pretrained source models are available, rather than individual-level source data. For a single source model, we propose a novel angle-based transfer learning (angleTL) method that leverages concordance between source and target model parameters. AngleTL adapts to the signal strength of the target model, unifies several benchmark methods, and mitigates negative transfer when between-population heterogeneity is large. We extend angleTL to incorporate multiple source models, accounting for varying levels of relevance among them. Our high-dimensional asymptotic analysis provides insights into when a source model benefits the target model and demonstrates the superiority of angleTL over other methods. Extensive simulations validate these findings and highlight the feasibility of applying angleTL to transfer genetic risk prediction models across multiple biobanks.

查看原文本刊更多论文

基于角度的高维鲁棒迁移学习。

迁移学习通过利用来自相关源群体的数据来提高目标模型的性能，特别是在目标数据稀缺的情况下。本研究解决了在存在异质源种群的情况下，用有限的目标数据训练高维回归模型的挑战。我们关注的是一个实际的设置，其中只有预训练源模型的参数估计可用，而不是个人层面的源数据。对于单源模型，我们提出了一种新的基于角度的迁移学习（angleTL）方法，该方法利用源模型和目标模型参数之间的一致性。AngleTL适应目标模型的信号强度，统一几种基准方法，在种群间异质性较大时减轻负迁移。我们扩展了angleTL以合并多个源模型，并考虑了它们之间不同程度的相关性。我们的高维渐近分析提供了源模型何时对目标模型有利的见解，并证明了angleTL优于其他方法。大量的模拟验证了这些发现，并强调了应用angleTL在多个生物库之间转移遗传风险预测模型的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the Royal Statistical Society Series B-Statistical Methodology 数学-统计学与概率论

CiteScore

8.80

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Series B (Statistical Methodology) aims to publish high quality papers on the methodological aspects of statistics and data science more broadly. The objective of papers should be to contribute to the understanding of statistical methodology and/or to develop and improve statistical methods; any mathematical theory should be directed towards these aims. The kinds of contribution considered include descriptions of new methods of collecting or analysing data, with the underlying theory, an indication of the scope of application and preferably a real example. Also considered are comparisons, critical evaluations and new applications of existing methods, contributions to probability theory which have a clear practical bearing (including the formulation and analysis of stochastic models), statistical computation or simulation where original methodology is involved and original contributions to the foundations of statistical science. Reviews of methodological techniques are also considered. A paper, even if correct and well presented, is likely to be rejected if it only presents straightforward special cases of previously published work, if it is of mathematical interest only, if it is too long in relation to the importance of the new material that it contains or if it is dominated by computations or simulations of a routine nature.