Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography.

Information processing in medical imaging : proceedings of the ... conference Pub Date : 2026-01-01 Epub Date: 2025-08-07 DOI:10.1007/978-3-031-96625-5_17

Yuexi Du, John A Onofrey, Nicha C Dvornek

{"title":"Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography.","authors":"Yuexi Du, John A Onofrey, Nicha C Dvornek","doi":"10.1007/978-3-031-96625-5_17","DOIUrl":null,"url":null,"abstract":"<p><p>Contrastive Language-Image Pre-training (CLIP) demonstrates strong potential in medical image analysis but requires substantial data and computational resources. Due to these restrictions, existing CLIP applications in medical imaging focus mainly on modalities like chest X-rays that have abundant image-report data available, leaving many other important modalities under-explored. Here, we propose one of the first adaptations of the full CLIP model to mammography, which presents significant challenges due to labeled data scarcity, high-resolution images with small regions of interest, and class-wise imbalance. We first develop a specialized supervision framework for mammography that leverages its multi-view nature. Furthermore, we design a symmetric local alignment module to better focus on detailed features in high-resolution images. Lastly, we incorporate a parameter-efficient fine-tuning approach for large language models pre-trained with medical knowledge to address data limitations. Our multi-view and multi-scale alignment (MaMA) method outperforms state-of-the-art baselines for three different tasks on two large real-world mammography datasets, EMBED and RSNA-Mammo, with only 52% model size compared with the largest baseline.</p>","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"15830 ","pages":"247-262"},"PeriodicalIF":0.0000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456755/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information processing in medical imaging : proceedings of the ... conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-031-96625-5_17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Contrastive Language-Image Pre-training (CLIP) demonstrates strong potential in medical image analysis but requires substantial data and computational resources. Due to these restrictions, existing CLIP applications in medical imaging focus mainly on modalities like chest X-rays that have abundant image-report data available, leaving many other important modalities under-explored. Here, we propose one of the first adaptations of the full CLIP model to mammography, which presents significant challenges due to labeled data scarcity, high-resolution images with small regions of interest, and class-wise imbalance. We first develop a specialized supervision framework for mammography that leverages its multi-view nature. Furthermore, we design a symmetric local alignment module to better focus on detailed features in high-resolution images. Lastly, we incorporate a parameter-efficient fine-tuning approach for large language models pre-trained with medical knowledge to address data limitations. Our multi-view and multi-scale alignment (MaMA) method outperforms state-of-the-art baselines for three different tasks on two large real-world mammography datasets, EMBED and RSNA-Mammo, with only 52% model size compared with the largest baseline.

查看原文本刊更多论文

乳房x光造影中对比语言图像预训练的多视角多尺度对齐。

对比语言图像预训练（CLIP）在医学图像分析中显示出强大的潜力，但需要大量的数据和计算资源。由于这些限制，现有的CLIP在医学成像中的应用主要集中在胸部x射线等具有丰富图像报告数据的模式上，而其他许多重要的模式尚未得到探索。在这里，我们提出了完整CLIP模型的第一个适应乳房x线摄影，由于标记数据稀缺，高分辨率图像具有小的兴趣区域，以及分类不平衡，这提出了重大挑战。我们首先为乳房x光检查开发一个专门的监督框架，利用其多视图的性质。此外，我们设计了一个对称的局部对齐模块，以更好地关注高分辨率图像的细节特征。最后，我们结合了一种参数有效的微调方法，用于预先用医学知识训练的大型语言模型，以解决数据限制。我们的多视图和多尺度对齐（MaMA）方法在两个大型真实乳房x线摄影数据集（EMBED和rsna - mamo）上的三个不同任务中优于最先进的基线，与最大基线相比，模型大小仅为52%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information processing in medical imaging : proceedings of the ... conference

自引率

0.00%

发文量