Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement.

IF 10.4 1区 生物学 Q1 GENETICS & HEREDITY
Yan Gao, Yan Cui
{"title":"Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement.","authors":"Yan Gao, Yan Cui","doi":"10.1186/s13073-024-01345-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Accurate prediction of an individual's predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets.</p><p><strong>Methods: </strong>We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer's disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups.</p><p><strong>Results: </strong>Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations.</p><p><strong>Conclusions: </strong>This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.</p>","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":"16 1","pages":"76"},"PeriodicalIF":10.4000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11149372/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-024-01345-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Accurate prediction of an individual's predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets.

Methods: We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer's disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups.

Results: Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations.

Conclusions: This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.

优化跨祖先的临床基因组疾病预测:帕累托改进的机器学习策略。
背景:准确预测个人的疾病易感性对于预防医学和早期干预至关重要。目前已开发出各种统计和机器学习模型,用于利用临床基因组数据预测疾病。然而,由于不同祖先群体在临床基因组数据集中的代表性不平等,临床基因组预测疾病的准确性在不同祖先群体之间可能存在很大差异:方法:我们引入了一种深度迁移学习方法,以提高临床基因组预测模型在数据弱势祖先群体中的性能。我们在肺癌、前列腺癌和阿尔茨海默病的多巢式基因组数据集以及具有内置数据不平等和不同祖先群体分布偏移的合成数据集上进行了机器学习实验:在我们的多祖先机器学习实验中,深度迁移学习大大提高了数据劣势人群的疾病预测准确率。相比之下,基于线性框架的迁移学习在这些数据劣势人群中没有取得可比的改善:本研究表明,深度迁移学习可以提高数据劣势人群的预测准确率,而不影响其他人群的预测准确率,从而提高多中心机器学习的公平性,为实现公平的疾病临床基因组预测提供帕累托改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genome Medicine
Genome Medicine GENETICS & HEREDITY-
CiteScore
20.80
自引率
0.80%
发文量
128
审稿时长
6-12 weeks
期刊介绍: Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信