Kerem Cekmeceli, Meva Himmetoglu, Guney I. Tombak, Anna Susmelj, Ertunc Erdil, Ender Konukoglu
{"title":"视觉基础模型能否增强医学图像分割的领域泛化?","authors":"Kerem Cekmeceli, Meva Himmetoglu, Guney I. Tombak, Anna Susmelj, Ertunc Erdil, Ender Konukoglu","doi":"arxiv-2409.07960","DOIUrl":null,"url":null,"abstract":"Neural networks achieve state-of-the-art performance in many supervised\nlearning tasks when the training data distribution matches the test data\ndistribution. However, their performance drops significantly under domain\n(covariate) shift, a prevalent issue in medical image segmentation due to\nvarying acquisition settings across different scanner models and protocols.\nRecently, foundational models (FMs) trained on large datasets have gained\nattention for their ability to be adapted for downstream tasks and achieve\nstate-of-the-art performance with excellent generalization capabilities on\nnatural images. However, their effectiveness in medical image segmentation\nremains underexplored. In this paper, we investigate the domain generalization\nperformance of various FMs, including DinoV2, SAM, MedSAM, and MAE, when\nfine-tuned using various parameter-efficient fine-tuning (PEFT) techniques such\nas Ladder and Rein (+LoRA) and decoder heads. We introduce a novel decode head\narchitecture, HQHSAM, which simply integrates elements from two\nstate-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation\nperformance. Our extensive experiments on multiple datasets, encompassing\nvarious anatomies and modalities, reveal that FMs, particularly with the HQHSAM\ndecode head, improve domain generalization for medical image segmentation.\nMoreover, we found that the effectiveness of PEFT techniques varies across\ndifferent FMs. These findings underscore the potential of FMs to enhance the\ndomain generalization performance of neural networks in medical image\nsegmentation across diverse clinical settings, providing a solid foundation for\nfuture research. Code and models are available for research purposes at\n\\url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?\",\"authors\":\"Kerem Cekmeceli, Meva Himmetoglu, Guney I. Tombak, Anna Susmelj, Ertunc Erdil, Ender Konukoglu\",\"doi\":\"arxiv-2409.07960\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks achieve state-of-the-art performance in many supervised\\nlearning tasks when the training data distribution matches the test data\\ndistribution. However, their performance drops significantly under domain\\n(covariate) shift, a prevalent issue in medical image segmentation due to\\nvarying acquisition settings across different scanner models and protocols.\\nRecently, foundational models (FMs) trained on large datasets have gained\\nattention for their ability to be adapted for downstream tasks and achieve\\nstate-of-the-art performance with excellent generalization capabilities on\\nnatural images. However, their effectiveness in medical image segmentation\\nremains underexplored. In this paper, we investigate the domain generalization\\nperformance of various FMs, including DinoV2, SAM, MedSAM, and MAE, when\\nfine-tuned using various parameter-efficient fine-tuning (PEFT) techniques such\\nas Ladder and Rein (+LoRA) and decoder heads. We introduce a novel decode head\\narchitecture, HQHSAM, which simply integrates elements from two\\nstate-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation\\nperformance. Our extensive experiments on multiple datasets, encompassing\\nvarious anatomies and modalities, reveal that FMs, particularly with the HQHSAM\\ndecode head, improve domain generalization for medical image segmentation.\\nMoreover, we found that the effectiveness of PEFT techniques varies across\\ndifferent FMs. These findings underscore the potential of FMs to enhance the\\ndomain generalization performance of neural networks in medical image\\nsegmentation across diverse clinical settings, providing a solid foundation for\\nfuture research. Code and models are available for research purposes at\\n\\\\url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07960\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?
Neural networks achieve state-of-the-art performance in many supervised
learning tasks when the training data distribution matches the test data
distribution. However, their performance drops significantly under domain
(covariate) shift, a prevalent issue in medical image segmentation due to
varying acquisition settings across different scanner models and protocols.
Recently, foundational models (FMs) trained on large datasets have gained
attention for their ability to be adapted for downstream tasks and achieve
state-of-the-art performance with excellent generalization capabilities on
natural images. However, their effectiveness in medical image segmentation
remains underexplored. In this paper, we investigate the domain generalization
performance of various FMs, including DinoV2, SAM, MedSAM, and MAE, when
fine-tuned using various parameter-efficient fine-tuning (PEFT) techniques such
as Ladder and Rein (+LoRA) and decoder heads. We introduce a novel decode head
architecture, HQHSAM, which simply integrates elements from two
state-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation
performance. Our extensive experiments on multiple datasets, encompassing
various anatomies and modalities, reveal that FMs, particularly with the HQHSAM
decode head, improve domain generalization for medical image segmentation.
Moreover, we found that the effectiveness of PEFT techniques varies across
different FMs. These findings underscore the potential of FMs to enhance the
domain generalization performance of neural networks in medical image
segmentation across diverse clinical settings, providing a solid foundation for
future research. Code and models are available for research purposes at
\url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}.