A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images

IF 15.7 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Nature Communications Pub Date : 2025-03-10 DOI:10.1038/s41467-025-57587-y

Zhaochang Yang, Ting Wei, Ying Liang, Xin Yuan, RuiTian Gao, Yujia Xia, Jie Zhou, Yue Zhang, Zhangsheng Yu

{"title":"A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images","authors":"Zhaochang Yang, Ting Wei, Ying Liang, Xin Yuan, RuiTian Gao, Yujia Xia, Jie Zhou, Yue Zhang, Zhangsheng Yu","doi":"10.1038/s41467-025-57587-y","DOIUrl":null,"url":null,"abstract":"<p>Computational pathology, utilizing whole slide images (WSIs) for pathological diagnosis, has advanced the development of intelligent healthcare. However, the scarcity of annotated data and histological differences hinder the general application of existing methods. Extensive histopathological data and the robustness of self-supervised models in small-scale data demonstrate promising prospects for developing foundation pathology models. Here we show BEPH (BEiT-based model Pre-training on Histopathological image), a foundation model that leverages self-supervised learning to learn meaningful representations from 11 million unlabeled histopathological images. These representations are then efficiently adapted to various tasks, including patch-level cancer diagnosis, WSI-level cancer classification, and survival prediction for multiple cancer subtypes. By leveraging the masked image modeling (MIM) pre-training approach, BEPH offers an efficient solution to enhance model performance, reduce the reliance on expert annotations, and facilitate the broader application of artificial intelligence in clinical settings. The pre-trained model is available at https://github.com/Zhcyoung/BEPH.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"39 1","pages":""},"PeriodicalIF":15.7000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-025-57587-y","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Computational pathology, utilizing whole slide images (WSIs) for pathological diagnosis, has advanced the development of intelligent healthcare. However, the scarcity of annotated data and histological differences hinder the general application of existing methods. Extensive histopathological data and the robustness of self-supervised models in small-scale data demonstrate promising prospects for developing foundation pathology models. Here we show BEPH (BEiT-based model Pre-training on Histopathological image), a foundation model that leverages self-supervised learning to learn meaningful representations from 11 million unlabeled histopathological images. These representations are then efficiently adapted to various tasks, including patch-level cancer diagnosis, WSI-level cancer classification, and survival prediction for multiple cancer subtypes. By leveraging the masked image modeling (MIM) pre-training approach, BEPH offers an efficient solution to enhance model performance, reduce the reliance on expert annotations, and facilitate the broader application of artificial intelligence in clinical settings. The pre-trained model is available at https://github.com/Zhcyoung/BEPH.

Abstract Image

查看原文本刊更多论文

从组织病理学图像中进行癌症诊断和生存预测的基础模型

计算病理学利用全幻灯片图像（WSIs）进行病理诊断，促进了智能医疗的发展。然而，注释数据的稀缺性和组织学差异阻碍了现有方法的普遍应用。广泛的组织病理学数据和自监督模型在小规模数据中的鲁棒性表明了开发基础病理学模型的良好前景。在这里，我们展示了BEPH（基于beit的模型对组织病理学图像的预训练），这是一个基础模型，利用自监督学习从1100万张未标记的组织病理学图像中学习有意义的表示。然后，这些表示有效地适应各种任务，包括补丁级癌症诊断、wsi级癌症分类和多种癌症亚型的生存预测。通过利用掩膜图像建模（MIM）预训练方法，BEPH提供了一种有效的解决方案，以提高模型性能，减少对专家注释的依赖，并促进人工智能在临床环境中的更广泛应用。预训练模型可在https://github.com/Zhcyoung/BEPH上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature Communications Biological Science Disciplines-

CiteScore

24.90

自引率

2.40%

发文量

6928

审稿时长

3.7 months

期刊介绍： Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.