{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">Robust Transfer Learning for High-Dimensional GLM Using <ns0:math> <ns0:semantics><ns0:mrow><ns0:mi>γ</ns0:mi></ns0:mrow> <ns0:annotation>$$ \\gamma $$</ns0:annotation></ns0:semantics> </ns0:math> -Divergence With Applications to Cancer Genomics.","authors":"Fuzhi Xu, Shuangge Ma, Qingzhao Zhang, Yaqing Xu","doi":"10.1002/sim.70170","DOIUrl":null,"url":null,"abstract":"<p><p>In the analysis of complex diseases, high-dimensional profiling data is important for assessing risks and detecting biomarkers. With the increasing accessibility of cancer genomic data, the sample sizes remain limited in most studies. Hence, borrowing information from additional data sources is thus desirable to improve estimation and prediction. Transfer learning has been demonstrated to be flexible and effective in boosting modeling performance with a record in biomedical applications. In practice, outliers and even data contamination often occur. However, existing transfer learning methods often lack robustness to outliers and data contamination, issues commonly observed in real-world biomedical data. In this study, we propose a robust transfer learning approach based on the minimum <math> <semantics><mrow><mi>γ</mi></mrow> <annotation>$$ \\gamma $$</annotation></semantics> </math> -divergence under a generalized linear model (GLM) framework for high-dimensional data. Our method incorporates a data-driven source detection scheme that automatically identifies informative sources while mitigating the risk of negative transfer. We establish rigorous theoretical results, including consistency and high-dimensional estimation error bounds, ensuring robustness and reliable performance. A computationally efficient algorithm is developed based on proximal gradient descent to facilitate both the transfer and debiasing steps. Simulation demonstrates the superior and competitive performance of the proposed approach in selection and prediction/classification. We further validate its practical utility by analyzing data on breast cancer and glioblastoma, showcasing the method's effectiveness in real-world high-dimensional settings.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70170"},"PeriodicalIF":1.8000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12313224/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70170","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
In the analysis of complex diseases, high-dimensional profiling data is important for assessing risks and detecting biomarkers. With the increasing accessibility of cancer genomic data, the sample sizes remain limited in most studies. Hence, borrowing information from additional data sources is thus desirable to improve estimation and prediction. Transfer learning has been demonstrated to be flexible and effective in boosting modeling performance with a record in biomedical applications. In practice, outliers and even data contamination often occur. However, existing transfer learning methods often lack robustness to outliers and data contamination, issues commonly observed in real-world biomedical data. In this study, we propose a robust transfer learning approach based on the minimum -divergence under a generalized linear model (GLM) framework for high-dimensional data. Our method incorporates a data-driven source detection scheme that automatically identifies informative sources while mitigating the risk of negative transfer. We establish rigorous theoretical results, including consistency and high-dimensional estimation error bounds, ensuring robustness and reliable performance. A computationally efficient algorithm is developed based on proximal gradient descent to facilitate both the transfer and debiasing steps. Simulation demonstrates the superior and competitive performance of the proposed approach in selection and prediction/classification. We further validate its practical utility by analyzing data on breast cancer and glioblastoma, showcasing the method's effectiveness in real-world high-dimensional settings.
期刊介绍:
The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.