Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography.

Yuexi Du, John A Onofrey, Nicha C Dvornek
{"title":"Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography.","authors":"Yuexi Du, John A Onofrey, Nicha C Dvornek","doi":"10.1007/978-3-031-96625-5_17","DOIUrl":null,"url":null,"abstract":"<p><p>Contrastive Language-Image Pre-training (CLIP) demonstrates strong potential in medical image analysis but requires substantial data and computational resources. Due to these restrictions, existing CLIP applications in medical imaging focus mainly on modalities like chest X-rays that have abundant image-report data available, leaving many other important modalities under-explored. Here, we propose one of the first adaptations of the full CLIP model to mammography, which presents significant challenges due to labeled data scarcity, high-resolution images with small regions of interest, and class-wise imbalance. We first develop a specialized supervision framework for mammography that leverages its multi-view nature. Furthermore, we design a symmetric local alignment module to better focus on detailed features in high-resolution images. Lastly, we incorporate a parameter-efficient fine-tuning approach for large language models pre-trained with medical knowledge to address data limitations. Our multi-view and multi-scale alignment (MaMA) method outperforms state-of-the-art baselines for three different tasks on two large real-world mammography datasets, EMBED and RSNA-Mammo, with only 52% model size compared with the largest baseline.</p>","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"15830 ","pages":"247-262"},"PeriodicalIF":0.0000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456755/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information processing in medical imaging : proceedings of the ... conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-031-96625-5_17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Contrastive Language-Image Pre-training (CLIP) demonstrates strong potential in medical image analysis but requires substantial data and computational resources. Due to these restrictions, existing CLIP applications in medical imaging focus mainly on modalities like chest X-rays that have abundant image-report data available, leaving many other important modalities under-explored. Here, we propose one of the first adaptations of the full CLIP model to mammography, which presents significant challenges due to labeled data scarcity, high-resolution images with small regions of interest, and class-wise imbalance. We first develop a specialized supervision framework for mammography that leverages its multi-view nature. Furthermore, we design a symmetric local alignment module to better focus on detailed features in high-resolution images. Lastly, we incorporate a parameter-efficient fine-tuning approach for large language models pre-trained with medical knowledge to address data limitations. Our multi-view and multi-scale alignment (MaMA) method outperforms state-of-the-art baselines for three different tasks on two large real-world mammography datasets, EMBED and RSNA-Mammo, with only 52% model size compared with the largest baseline.

乳房x光造影中对比语言图像预训练的多视角多尺度对齐。
对比语言图像预训练(CLIP)在医学图像分析中显示出强大的潜力,但需要大量的数据和计算资源。由于这些限制,现有的CLIP在医学成像中的应用主要集中在胸部x射线等具有丰富图像报告数据的模式上,而其他许多重要的模式尚未得到探索。在这里,我们提出了完整CLIP模型的第一个适应乳房x线摄影,由于标记数据稀缺,高分辨率图像具有小的兴趣区域,以及分类不平衡,这提出了重大挑战。我们首先为乳房x光检查开发一个专门的监督框架,利用其多视图的性质。此外,我们设计了一个对称的局部对齐模块,以更好地关注高分辨率图像的细节特征。最后,我们结合了一种参数有效的微调方法,用于预先用医学知识训练的大型语言模型,以解决数据限制。我们的多视图和多尺度对齐(MaMA)方法在两个大型真实乳房x线摄影数据集(EMBED和rsna - mamo)上的三个不同任务中优于最先进的基线,与最大基线相比,模型大小仅为52%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信