Towards a CMR Foundation Model for Multi-Task Cardiac Image Analysis.

IF 6.1 1区医学 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS

Journal of Cardiovascular Magnetic Resonance Pub Date : 2025-10-02 DOI:10.1016/j.jocmr.2025.101967

Athira J Jacob, Indraneel Borgohain, Teodora Chitiboi, Puneet Sharma, Dorin Comaniciu, Daniel Rueckert

{"title":"Towards a CMR Foundation Model for Multi-Task Cardiac Image Analysis.","authors":"Athira J Jacob, Indraneel Borgohain, Teodora Chitiboi, Puneet Sharma, Dorin Comaniciu, Daniel Rueckert","doi":"10.1016/j.jocmr.2025.101967","DOIUrl":null,"url":null,"abstract":"Background: Cardiac magnetic resonance (CMR) is a complex imaging modality requiring a broad variety of image processing tasks for comprehensive assessment of the study. Recently, foundation models (FM) have shown promise for automated image analyses in natural images (NI). In this study, a CMR-specific vision FM was developed and then finetuned in a supervised manner for 9 different imaging tasks typical to a CMR workflow, including classification, segmentation, landmark localization, and pathology detection.Methods: A ViT-S/8 model was trained in a self-supervised manner using DINO on 36 million CMR images from 27,524 subjects from three sources (UK Biobank and two clinical centers). The model was then finetuned for 9 tasks: classification (sequence, cine view), segmentation (cine SAX, cine LAX, LGE SAX, Mapping SAX), landmark localization, pathology detection (LGE, cardiac disease), on data from various sources (both public and 3 clinical datasets). The results were compared against metrics from state-of-the-art methods on the same tasks. A comparable baseline model was also trained on the same datasets for direct comparison. Additionally, the effect of pretraining strategy, as well as generalization and few-shot performance (training on few labeled samples) were explored for the pretrained model, compared to the baseline.Results: The proposed model obtained similar performance or moderate improvements to results reported in the literature in most tasks (except disease detection), without any task-specific optimization of methodology. The proposed model outperformed the baseline in most cases, with an average increase of 6.8 percentage points (pp) for cine view classification, and 0.1 to 1.8 pp for segmentation tasks. The proposed method also obtained generally lower standard deviations in the metrics. Improvements of 3.7 and 6.6 pp for hyperenhancement detection from LGE and 14 pp for disease detection were observed. Ablation studies highlighted the importance of pretraining strategy, architecture and the impact of domain shifts from pretraining to finetuning. Moreover, CMR-pretrained model achieved better generalization and few-shot performance compared to the baseline.Conclusions: Vision FM specialized for medical imaging can improve accuracy and robustness over NI-FM. Self-supervised pretraining offers a resource-efficient, unified framework for CMR assessment, with the potential to accelerate the development of deep learning-based solutions for image analysis tasks, even with few annotated data available.","PeriodicalId":15221,"journal":{"name":"Journal of Cardiovascular Magnetic Resonance","volume":" ","pages":"101967"},"PeriodicalIF":6.1000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cardiovascular Magnetic Resonance","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jocmr.2025.101967","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Cardiac magnetic resonance (CMR) is a complex imaging modality requiring a broad variety of image processing tasks for comprehensive assessment of the study. Recently, foundation models (FM) have shown promise for automated image analyses in natural images (NI). In this study, a CMR-specific vision FM was developed and then finetuned in a supervised manner for 9 different imaging tasks typical to a CMR workflow, including classification, segmentation, landmark localization, and pathology detection.

Methods: A ViT-S/8 model was trained in a self-supervised manner using DINO on 36 million CMR images from 27,524 subjects from three sources (UK Biobank and two clinical centers). The model was then finetuned for 9 tasks: classification (sequence, cine view), segmentation (cine SAX, cine LAX, LGE SAX, Mapping SAX), landmark localization, pathology detection (LGE, cardiac disease), on data from various sources (both public and 3 clinical datasets). The results were compared against metrics from state-of-the-art methods on the same tasks. A comparable baseline model was also trained on the same datasets for direct comparison. Additionally, the effect of pretraining strategy, as well as generalization and few-shot performance (training on few labeled samples) were explored for the pretrained model, compared to the baseline.

Results: The proposed model obtained similar performance or moderate improvements to results reported in the literature in most tasks (except disease detection), without any task-specific optimization of methodology. The proposed model outperformed the baseline in most cases, with an average increase of 6.8 percentage points (pp) for cine view classification, and 0.1 to 1.8 pp for segmentation tasks. The proposed method also obtained generally lower standard deviations in the metrics. Improvements of 3.7 and 6.6 pp for hyperenhancement detection from LGE and 14 pp for disease detection were observed. Ablation studies highlighted the importance of pretraining strategy, architecture and the impact of domain shifts from pretraining to finetuning. Moreover, CMR-pretrained model achieved better generalization and few-shot performance compared to the baseline.

Conclusions: Vision FM specialized for medical imaging can improve accuracy and robustness over NI-FM. Self-supervised pretraining offers a resource-efficient, unified framework for CMR assessment, with the potential to accelerate the development of deep learning-based solutions for image analysis tasks, even with few annotated data available.

查看原文本刊更多论文

一种多任务心脏图像分析的CMR基础模型。

背景：心脏磁共振（CMR）是一种复杂的成像方式，需要各种各样的图像处理任务来全面评估研究。最近，基础模型（FM）在自然图像（NI）的自动图像分析中显示出了前景。在本研究中，开发了一种CMR特异性视觉调频，然后以监督方式对CMR工作流程中典型的9种不同成像任务进行微调，包括分类、分割、地标定位和病理检测。方法：使用DINO对来自三个来源（UK Biobank和两个临床中心）27,524名受试者的3,600万张CMR图像进行自监督训练，建立ViT-S/8模型。然后对模型进行9项任务的微调：分类（序列，电影视图），分割（cine SAX, cine LAX, LGE SAX, Mapping SAX），地标定位，病理检测（LGE，心脏病），数据来自各种来源（包括公共数据集和3个临床数据集）。将结果与相同任务的最先进方法的指标进行比较。还在相同的数据集上训练了一个可比较的基线模型，以便进行直接比较。此外，与基线相比，还探讨了预训练策略的影响，以及泛化和少射性能（在少数标记样本上进行训练）。结果：提出的模型在大多数任务（疾病检测除外）中获得了与文献报道的结果相似的性能或适度的改进，没有任何针对特定任务的方法优化。所提出的模型在大多数情况下都优于基线，在电影视图分类方面平均提高了6.8个百分点（pp），在分割任务方面平均提高了0.1到1.8个百分点。所提出的方法在度量上也得到了普遍较低的标准差。观察到LGE的高增强检测提高3.7和6.6 pp，疾病检测提高14 pp。消融研究强调了预训练策略、架构以及从预训练到微调的领域转移的重要性。此外，与基线相比，cmr预训练模型具有更好的泛化和少镜头性能。结论：医学影像专用视觉调频比ni调频精度高，鲁棒性好。自我监督预训练为CMR评估提供了一个资源高效、统一的框架，有可能加速基于深度学习的图像分析任务解决方案的开发，即使只有很少的注释数据可用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Cardiovascular Magnetic Resonance 医学-核医学

CiteScore

10.90

自引率

12.50%

发文量

审稿时长

6-12 weeks

期刊介绍： Journal of Cardiovascular Magnetic Resonance (JCMR) publishes high-quality articles on all aspects of basic, translational and clinical research on the design, development, manufacture, and evaluation of cardiovascular magnetic resonance (CMR) methods applied to the cardiovascular system. Topical areas include, but are not limited to: New applications of magnetic resonance to improve the diagnostic strategies, risk stratification, characterization and management of diseases affecting the cardiovascular system. New methods to enhance or accelerate image acquisition and data analysis. Results of multicenter, or larger single-center studies that provide insight into the utility of CMR. Basic biological perceptions derived by CMR methods.