Andrea Moglia , Matteo Leccardi , Matteo Cavicchioli , Alice Maccarini , Marco Marcon , Luca Mainardi , Pietro Cerveri
{"title":"Generalist models in medical image segmentation: A survey and performance comparison with task-specific approaches","authors":"Andrea Moglia , Matteo Leccardi , Matteo Cavicchioli , Alice Maccarini , Marco Marcon , Luca Mainardi , Pietro Cerveri","doi":"10.1016/j.inffus.2025.103709","DOIUrl":null,"url":null,"abstract":"<div><div>Following the successful paradigm shift of large language models, which leverages pre-training on a massive corpus of data and fine-tuning on various downstream tasks, generalist models have made their foray into computer vision. The introduction of the Segment Anything Model (SAM) marked a milestone in the segmentation of natural images, inspiring the design of numerous architectures for medical image segmentation. In this survey, we offer a comprehensive and in-depth investigation of generalist models for medical image segmentation. We begin with an introduction to the fundamental concepts that underpin their development. Then, we provide a taxonomy based on features fusion on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on SAM2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI, physical AI, and clinical translation. We publicly release a database-backed interactive app with all survey data (<span><span>https://hal9000-lab.github.io/GMMIS-Survey/</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103709"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156625352500781X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Following the successful paradigm shift of large language models, which leverages pre-training on a massive corpus of data and fine-tuning on various downstream tasks, generalist models have made their foray into computer vision. The introduction of the Segment Anything Model (SAM) marked a milestone in the segmentation of natural images, inspiring the design of numerous architectures for medical image segmentation. In this survey, we offer a comprehensive and in-depth investigation of generalist models for medical image segmentation. We begin with an introduction to the fundamental concepts that underpin their development. Then, we provide a taxonomy based on features fusion on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on SAM2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI, physical AI, and clinical translation. We publicly release a database-backed interactive app with all survey data (https://hal9000-lab.github.io/GMMIS-Survey/).
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.