Enhancing Genetic Risk Prediction through Federated Semi-Supervised Transfer Learning with Inaccurate Electronic Health Record Data.

IF 0.4 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Biosciences Pub Date : 2024-08-13 DOI:10.1007/s12561-024-09449-2

Yuying Lu, Tian Gu, Rui Duan

{"title":"Enhancing Genetic Risk Prediction through Federated Semi-Supervised Transfer Learning with Inaccurate Electronic Health Record Data.","authors":"Yuying Lu, Tian Gu, Rui Duan","doi":"10.1007/s12561-024-09449-2","DOIUrl":null,"url":null,"abstract":"<p><p>Large-scale genomics data combined with Electronic Health Records (EHRs) illuminate the path towards personalized disease management and enhanced medical interventions. However, the absence of \"gold standard\" disease labels makes the development of machine learning models a challenging task. Additionally, imbalances in demographic representation within datasets compromise the development of unbiased healthcare solutions. In response to these challenges, we introduce FEderated Semi-Supervised Transfer Learning (FEST) for improving disease risk predictions in underrepresented populations. FEST facilitates the collaborative training of models across various institutions by leveraging both labeled and unlabeled data from diverse subpopulations. It addresses distributional variations across different populations and healthcare institutions by combining density ratio reweighting and model calibration techniques. Federated learning algorithms are developed for training models using only summary-level statistics. We perform simulation studies to assess the efficacy of FEST in comparisons with a few alternative methods. Subsequently, we apply FEST to training a genetic risk prediction model for type 2 diabetes that targets the African-Ancestry population using data from the Massachusetts General Brigham (MGB) Biobank. Both our computational experiments and real-world data application underline the superior performance of FEST over competing methods.</p>","PeriodicalId":45094,"journal":{"name":"Statistics in Biosciences","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12409711/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Biosciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12561-024-09449-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Large-scale genomics data combined with Electronic Health Records (EHRs) illuminate the path towards personalized disease management and enhanced medical interventions. However, the absence of "gold standard" disease labels makes the development of machine learning models a challenging task. Additionally, imbalances in demographic representation within datasets compromise the development of unbiased healthcare solutions. In response to these challenges, we introduce FEderated Semi-Supervised Transfer Learning (FEST) for improving disease risk predictions in underrepresented populations. FEST facilitates the collaborative training of models across various institutions by leveraging both labeled and unlabeled data from diverse subpopulations. It addresses distributional variations across different populations and healthcare institutions by combining density ratio reweighting and model calibration techniques. Federated learning algorithms are developed for training models using only summary-level statistics. We perform simulation studies to assess the efficacy of FEST in comparisons with a few alternative methods. Subsequently, we apply FEST to training a genetic risk prediction model for type 2 diabetes that targets the African-Ancestry population using data from the Massachusetts General Brigham (MGB) Biobank. Both our computational experiments and real-world data application underline the superior performance of FEST over competing methods.

查看原文本刊更多论文

基于不准确电子病历数据的联邦半监督迁移学习增强遗传风险预测。

大规模基因组学数据与电子健康记录（EHRs）相结合，为个性化疾病管理和增强医疗干预指明了道路。然而，缺乏“金标准”疾病标签使得机器学习模型的开发成为一项具有挑战性的任务。此外，数据集中人口代表性的不平衡影响了公正医疗保健解决方案的发展。为了应对这些挑战，我们引入联邦半监督迁移学习（FEST）来改善代表性不足人群的疾病风险预测。FEST通过利用来自不同亚群的标记和未标记数据，促进了不同机构之间模型的协作训练。它通过结合密度比重加权和模型校准技术来解决不同人群和医疗机构之间的分布变化。联邦学习算法是为只使用摘要级统计数据的训练模型而开发的。我们进行模拟研究，以评估与一些替代方法比较FEST的有效性。随后，我们将FEST应用于训练针对非洲裔人群的2型糖尿病遗传风险预测模型，该模型使用来自马萨诸塞州布里格姆（MGB）生物银行的数据。我们的计算实验和实际数据应用都强调了FEST优于竞争方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistics in Biosciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

2.00

自引率

0.00%

发文量

期刊介绍： Statistics in Biosciences (SIBS) is published three times a year in print and electronic form. It aims at development and application of statistical methods and their interface with other quantitative methods, such as computational and mathematical methods, in biological and life science, health science, and biopharmaceutical and biotechnological science. SIBS publishes scientific papers and review articles in four sections, with the first two sections as the primary sections. Original Articles publish novel statistical and quantitative methods in biosciences. The Bioscience Case Studies and Practice Articles publish papers that advance statistical practice in biosciences, such as case studies, innovative applications of existing methods that further understanding of subject-matter science, evaluation of existing methods and data sources. Review Articles publish papers that review an area of statistical and quantitative methodology, software, and data sources in biosciences. Commentaries provide perspectives of research topics or policy issues that are of current quantitative interest in biosciences, reactions to an article published in the journal, and scholarly essays. Substantive science is essential in motivating and demonstrating the methodological development and use for an article to be acceptable. Articles published in SIBS share the goal of promoting evidence-based real world practice and policy making through effective and timely interaction and communication of statisticians and quantitative researchers with subject-matter scientists in biosciences.