Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement.

IF 10.4 1区生物学 Q1 GENETICS & HEREDITY

Genome Medicine Pub Date : 2024-06-04 DOI:10.1186/s13073-024-01345-0

Yan Gao, Yan Cui

{"title":"Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement.","authors":"Yan Gao, Yan Cui","doi":"10.1186/s13073-024-01345-0","DOIUrl":null,"url":null,"abstract":"Background: Accurate prediction of an individual's predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets.Methods: We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer's disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups.Results: Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations.Conclusions: This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":"16 1","pages":"76"},"PeriodicalIF":10.4000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11149372/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-024-01345-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Accurate prediction of an individual's predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets.

Methods: We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer's disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups.

Results: Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations.

Conclusions: This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.

查看原文本刊更多论文

优化跨祖先的临床基因组疾病预测：帕累托改进的机器学习策略。

背景：准确预测个人的疾病易感性对于预防医学和早期干预至关重要。目前已开发出各种统计和机器学习模型，用于利用临床基因组数据预测疾病。然而，由于不同祖先群体在临床基因组数据集中的代表性不平等，临床基因组预测疾病的准确性在不同祖先群体之间可能存在很大差异：方法：我们引入了一种深度迁移学习方法，以提高临床基因组预测模型在数据弱势祖先群体中的性能。我们在肺癌、前列腺癌和阿尔茨海默病的多巢式基因组数据集以及具有内置数据不平等和不同祖先群体分布偏移的合成数据集上进行了机器学习实验：在我们的多祖先机器学习实验中，深度迁移学习大大提高了数据劣势人群的疾病预测准确率。相比之下，基于线性框架的迁移学习在这些数据劣势人群中没有取得可比的改善：本研究表明，深度迁移学习可以提高数据劣势人群的预测准确率，而不影响其他人群的预测准确率，从而提高多中心机器学习的公平性，为实现公平的疾病临床基因组预测提供帕累托改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genome Medicine GENETICS & HEREDITY-

CiteScore

20.80

自引率

0.80%

发文量

128

审稿时长

6-12 weeks

期刊介绍： Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.