视觉基础模型能否增强医学图像分割的领域泛化？

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI:arxiv-2409.07960

Kerem Cekmeceli, Meva Himmetoglu, Guney I. Tombak, Anna Susmelj, Ertunc Erdil, Ender Konukoglu

{"title":"视觉基础模型能否增强医学图像分割的领域泛化？","authors":"Kerem Cekmeceli, Meva Himmetoglu, Guney I. Tombak, Anna Susmelj, Ertunc Erdil, Ender Konukoglu","doi":"arxiv-2409.07960","DOIUrl":null,"url":null,"abstract":"Neural networks achieve state-of-the-art performance in many supervised\nlearning tasks when the training data distribution matches the test data\ndistribution. However, their performance drops significantly under domain\n(covariate) shift, a prevalent issue in medical image segmentation due to\nvarying acquisition settings across different scanner models and protocols.\nRecently, foundational models (FMs) trained on large datasets have gained\nattention for their ability to be adapted for downstream tasks and achieve\nstate-of-the-art performance with excellent generalization capabilities on\nnatural images. However, their effectiveness in medical image segmentation\nremains underexplored. In this paper, we investigate the domain generalization\nperformance of various FMs, including DinoV2, SAM, MedSAM, and MAE, when\nfine-tuned using various parameter-efficient fine-tuning (PEFT) techniques such\nas Ladder and Rein (+LoRA) and decoder heads. We introduce a novel decode head\narchitecture, HQHSAM, which simply integrates elements from two\nstate-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation\nperformance. Our extensive experiments on multiple datasets, encompassing\nvarious anatomies and modalities, reveal that FMs, particularly with the HQHSAM\ndecode head, improve domain generalization for medical image segmentation.\nMoreover, we found that the effectiveness of PEFT techniques varies across\ndifferent FMs. These findings underscore the potential of FMs to enhance the\ndomain generalization performance of neural networks in medical image\nsegmentation across diverse clinical settings, providing a solid foundation for\nfuture research. Code and models are available for research purposes at\n\\url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?\",\"authors\":\"Kerem Cekmeceli, Meva Himmetoglu, Guney I. Tombak, Anna Susmelj, Ertunc Erdil, Ender Konukoglu\",\"doi\":\"arxiv-2409.07960\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks achieve state-of-the-art performance in many supervised\\nlearning tasks when the training data distribution matches the test data\\ndistribution. However, their performance drops significantly under domain\\n(covariate) shift, a prevalent issue in medical image segmentation due to\\nvarying acquisition settings across different scanner models and protocols.\\nRecently, foundational models (FMs) trained on large datasets have gained\\nattention for their ability to be adapted for downstream tasks and achieve\\nstate-of-the-art performance with excellent generalization capabilities on\\nnatural images. However, their effectiveness in medical image segmentation\\nremains underexplored. In this paper, we investigate the domain generalization\\nperformance of various FMs, including DinoV2, SAM, MedSAM, and MAE, when\\nfine-tuned using various parameter-efficient fine-tuning (PEFT) techniques such\\nas Ladder and Rein (+LoRA) and decoder heads. We introduce a novel decode head\\narchitecture, HQHSAM, which simply integrates elements from two\\nstate-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation\\nperformance. Our extensive experiments on multiple datasets, encompassing\\nvarious anatomies and modalities, reveal that FMs, particularly with the HQHSAM\\ndecode head, improve domain generalization for medical image segmentation.\\nMoreover, we found that the effectiveness of PEFT techniques varies across\\ndifferent FMs. These findings underscore the potential of FMs to enhance the\\ndomain generalization performance of neural networks in medical image\\nsegmentation across diverse clinical settings, providing a solid foundation for\\nfuture research. Code and models are available for research purposes at\\n\\\\url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07960\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

当训练数据分布与测试数据分布相匹配时，神经网络在许多监督学习任务中都能达到最先进的性能。最近，在大型数据集上训练的基础模型（FMs）因其适应下游任务的能力和在自然图像上出色的泛化能力而备受关注。然而，它们在医学图像分割中的有效性仍未得到充分探索。在本文中，我们研究了包括 DinoV2、SAM、MedSAM 和 MAE 在内的各种调频技术在使用梯形图和莱因（+LoRA）等各种参数高效微调（PEFT）技术和解码头进行微调时的领域泛化性能。我们引入了一种新颖的解码头架构 HQHSAM，它简单地整合了 HSAM 和 HQSAM 这两种最先进解码头的元素，以提高分割性能。我们在多个数据集（包括各种解剖结构和模式）上进行了广泛的实验，结果表明调频（尤其是 HQHSAM 解码头）提高了医学图像分割的领域泛化能力。这些发现强调了调频技术在不同临床环境下提高神经网络在医学图像分割中的领域泛化性能的潜力，为未来的研究奠定了坚实的基础。用于研究目的的代码和模型请访问：url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?

Neural networks achieve state-of-the-art performance in many supervised learning tasks when the training data distribution matches the test data distribution. However, their performance drops significantly under domain (covariate) shift, a prevalent issue in medical image segmentation due to varying acquisition settings across different scanner models and protocols. Recently, foundational models (FMs) trained on large datasets have gained attention for their ability to be adapted for downstream tasks and achieve state-of-the-art performance with excellent generalization capabilities on natural images. However, their effectiveness in medical image segmentation remains underexplored. In this paper, we investigate the domain generalization performance of various FMs, including DinoV2, SAM, MedSAM, and MAE, when fine-tuned using various parameter-efficient fine-tuning (PEFT) techniques such as Ladder and Rein (+LoRA) and decoder heads. We introduce a novel decode head architecture, HQHSAM, which simply integrates elements from two state-of-the-art decoder heads, HSAM and HQSAM, to enhance segmentation performance. Our extensive experiments on multiple datasets, encompassing various anatomies and modalities, reveal that FMs, particularly with the HQHSAM decode head, improve domain generalization for medical image segmentation. Moreover, we found that the effectiveness of PEFT techniques varies across different FMs. These findings underscore the potential of FMs to enhance the domain generalization performance of neural networks in medical image segmentation across diverse clinical settings, providing a solid foundation for future research. Code and models are available for research purposes at \url{https://github.com/kerem-cekmeceli/Foundation-Models-for-Medical-Imagery}.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量