Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?

Kerem Cekmeceli, Meva Himmetoglu, Guney I. Tombak, Anna Susmelj, Ertunc Erdil, Ender Konukoglu
{"title":"Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?","authors":"Kerem Cekmeceli, Meva Himmetoglu, Guney I. Tombak, Anna Susmelj, Ertunc Erdil, Ender Konukoglu","doi":"arxiv-2409.07960","DOIUrl":null,"url":null,"abstract":"Neural networks achieve state-of-the-art performance in many supervised\nlearning tasks when the training data distribution matches the test data\ndistribution. However, their performance drops significantly under domain\n(covariate) shift, a prevalent issue in medical image segmentation due to\nvarying acquisition settings across different scanner models and protocols.\nRecently, foundational models (FMs) trained on large datasets have gained\nattention for their ability to be adapted for downstream tasks and achieve\nstate-of-the-art performance with excellent generalization capabilities on\nnatural images. However, their effectiveness in medical image segmentation\nremains underexplored. In this paper, we investigate the domain generalization\nperformance of various FMs, including DinoV2, SAM, MedSAM, and MAE, when\nfine-tuned using various parameter-efficient fine-tuning (PEFT) techniques such\nas Ladder and Rein (+LoRA) and decoder heads. We introduce a novel decode head\narchitecture, HQHSAM, which simply integrates elements from two\nstate-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation\nperformance. Our extensive experiments on multiple datasets, encompassing\nvarious anatomies and modalities, reveal that FMs, particularly with the HQHSAM\ndecode head, improve domain generalization for medical image segmentation.\nMoreover, we found that the effectiveness of PEFT techniques varies across\ndifferent FMs. These findings underscore the potential of FMs to enhance the\ndomain generalization performance of neural networks in medical image\nsegmentation across diverse clinical settings, providing a solid foundation for\nfuture research. Code and models are available for research purposes at\n\\url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Neural networks achieve state-of-the-art performance in many supervised learning tasks when the training data distribution matches the test data distribution. However, their performance drops significantly under domain (covariate) shift, a prevalent issue in medical image segmentation due to varying acquisition settings across different scanner models and protocols. Recently, foundational models (FMs) trained on large datasets have gained attention for their ability to be adapted for downstream tasks and achieve state-of-the-art performance with excellent generalization capabilities on natural images. However, their effectiveness in medical image segmentation remains underexplored. In this paper, we investigate the domain generalization performance of various FMs, including DinoV2, SAM, MedSAM, and MAE, when fine-tuned using various parameter-efficient fine-tuning (PEFT) techniques such as Ladder and Rein (+LoRA) and decoder heads. We introduce a novel decode head architecture, HQHSAM, which simply integrates elements from two state-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation performance. Our extensive experiments on multiple datasets, encompassing various anatomies and modalities, reveal that FMs, particularly with the HQHSAM decode head, improve domain generalization for medical image segmentation. Moreover, we found that the effectiveness of PEFT techniques varies across different FMs. These findings underscore the potential of FMs to enhance the domain generalization performance of neural networks in medical image segmentation across diverse clinical settings, providing a solid foundation for future research. Code and models are available for research purposes at \url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}.
视觉基础模型能否增强医学图像分割的领域泛化?
当训练数据分布与测试数据分布相匹配时,神经网络在许多监督学习任务中都能达到最先进的性能。最近,在大型数据集上训练的基础模型(FMs)因其适应下游任务的能力和在自然图像上出色的泛化能力而备受关注。然而,它们在医学图像分割中的有效性仍未得到充分探索。在本文中,我们研究了包括 DinoV2、SAM、MedSAM 和 MAE 在内的各种调频技术在使用梯形图和莱因(+LoRA)等各种参数高效微调(PEFT)技术和解码头进行微调时的领域泛化性能。我们引入了一种新颖的解码头架构 HQHSAM,它简单地整合了 HSAM 和 HQSAM 这两种最先进解码头的元素,以提高分割性能。我们在多个数据集(包括各种解剖结构和模式)上进行了广泛的实验,结果表明调频(尤其是 HQHSAM 解码头)提高了医学图像分割的领域泛化能力。这些发现强调了调频技术在不同临床环境下提高神经网络在医学图像分割中的领域泛化性能的潜力,为未来的研究奠定了坚实的基础。用于研究目的的代码和模型请访问:url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信