Building the world’s first truly global medical foundation model

IF 50 1区医学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Nature Medicine Pub Date : 2025-09-08 DOI:10.1038/s41591-025-03859-5

Yih Chung Tham, Jocelyn Hui Lin Goh, Paul Nderitu, Yukun Zhou, An Ran Ran, Sahana Srinivasan, Gabriel Dawei Yang, Gatera Fiston Kitema, Polly Rawlinson, Hongyang Jiang, Ke Zou, Carol Y. Cheung, Pearse A. Keane

{"title":"Building the world’s first truly global medical foundation model","authors":"Yih Chung Tham, Jocelyn Hui Lin Goh, Paul Nderitu, Yukun Zhou, An Ran Ran, Sahana Srinivasan, Gabriel Dawei Yang, Gatera Fiston Kitema, Polly Rawlinson, Hongyang Jiang, Ke Zou, Carol Y. Cheung, Pearse A. Keane","doi":"10.1038/s41591-025-03859-5","DOIUrl":null,"url":null,"abstract":"Since 2022, the field of medical artificial intelligence (AI) has begun a shift toward foundation models, machine learning systems that are trained on broad data at scale and are adaptable to a wide range of downstream tasks1,2. Medical foundation models are rapidly evolving, driven by the synergy of expanding medical data repositories, advances in neural network architecture (especially transformers), self-supervised learning approaches and computing power. Medical foundation models are capable of performing, or can be adapted to perform, a range of medical tasks with a minimal amount of annotated data. To date, several promising breakthroughs with foundation models have been demonstrated across diverse medical domains, including pathology, radiology and ophthalmology3,4,5.Nevertheless, training robust medical foundation models requires large, diverse and clinically useful representative data6. Assembling such datasets remains a major challenge for the research community because of strict data-sharing regulations intended to protect patient privacy and ensure ethical compliance. For these reasons, most existing foundational models are trained on datasets that are geographically and demographically ‘narrow’ (that is, not globally representative), limiting their generalizability and effectiveness, particularly in under-represented regions and populations6,7,8.","PeriodicalId":19037,"journal":{"name":"Nature Medicine","volume":"52 1","pages":""},"PeriodicalIF":50.0000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41591-025-03859-5","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Since 2022, the field of medical artificial intelligence (AI) has begun a shift toward foundation models, machine learning systems that are trained on broad data at scale and are adaptable to a wide range of downstream tasks^1,2. Medical foundation models are rapidly evolving, driven by the synergy of expanding medical data repositories, advances in neural network architecture (especially transformers), self-supervised learning approaches and computing power. Medical foundation models are capable of performing, or can be adapted to perform, a range of medical tasks with a minimal amount of annotated data. To date, several promising breakthroughs with foundation models have been demonstrated across diverse medical domains, including pathology, radiology and ophthalmology^3,4,5.

Nevertheless, training robust medical foundation models requires large, diverse and clinically useful representative data⁶. Assembling such datasets remains a major challenge for the research community because of strict data-sharing regulations intended to protect patient privacy and ensure ethical compliance. For these reasons, most existing foundational models are trained on datasets that are geographically and demographically ‘narrow’ (that is, not globally representative), limiting their generalizability and effectiveness, particularly in under-represented regions and populations^6,7,8.

Abstract Image

查看原文本刊更多论文

打造全球首个真正意义上的全球医学基金会模式

自2022年以来，医疗人工智能（AI）领域已开始向基础模型、机器学习系统转变，这些系统在大规模的广泛数据上进行训练，并适应广泛的下游任务1,2。在不断扩展的医疗数据存储库、神经网络架构（尤其是变压器）的进步、自我监督学习方法和计算能力的协同作用的推动下，医学基础模型正在迅速发展。医学基础模型能够使用最少量的带注释的数据执行一系列医疗任务，或者可以调整为执行这些任务。到目前为止，基础模型的一些有希望的突破已经在不同的医学领域得到了证明，包括病理学、放射学和眼科学3,4,5。然而，训练稳健的医学基础模型需要大量、多样和临床有用的代表性数据。由于严格的数据共享法规旨在保护患者隐私并确保符合伦理，因此收集此类数据集仍然是研究界面临的主要挑战。由于这些原因，大多数现有的基础模型都是在地理和人口统计学上“狭窄”（即不具有全球代表性）的数据集上进行训练的，这限制了它们的泛化性和有效性，特别是在代表性不足的地区和人口6,7,8。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature Medicine 医学-生化与分子生物学

CiteScore

100.90

自引率

0.70%

发文量

525

审稿时长

1 months

期刊介绍： Nature Medicine is a monthly journal publishing original peer-reviewed research in all areas of medicine. The publication focuses on originality, timeliness, interdisciplinary interest, and the impact on improving human health. In addition to research articles, Nature Medicine also publishes commissioned content such as News, Reviews, and Perspectives. This content aims to provide context for the latest advances in translational and clinical research, reaching a wide audience of M.D. and Ph.D. readers. All editorial decisions for the journal are made by a team of full-time professional editors. Nature Medicine consider all types of clinical research, including: -Case-reports and small case series -Clinical trials, whether phase 1, 2, 3 or 4 -Observational studies -Meta-analyses -Biomarker studies -Public and global health studies Nature Medicine is also committed to facilitating communication between translational and clinical researchers. As such, we consider “hybrid” studies with preclinical and translational findings reported alongside data from clinical studies.