Information Fusion最新文献

筛选
英文 中文
Towards the generalization of multi-view learning: An information-theoretical analysis 多视角学习的普遍化:一个信息理论分析
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-29 DOI: 10.1016/j.inffus.2025.103776
Wen Wen , Tieliang Gong , Yuxin Dong , Shujian Yu , Bo Dong
{"title":"Towards the generalization of multi-view learning: An information-theoretical analysis","authors":"Wen Wen ,&nbsp;Tieliang Gong ,&nbsp;Yuxin Dong ,&nbsp;Shujian Yu ,&nbsp;Bo Dong","doi":"10.1016/j.inffus.2025.103776","DOIUrl":"10.1016/j.inffus.2025.103776","url":null,"abstract":"<div><div>Multiview learning has drawn widespread attention for its efficacy in leveraging cross-view consensus and complementarity information to achieve a comprehensive representation of data. While multi-view learning has undergone vigorous development and achieved remarkable success, the theoretical understanding of its generalization behavior remains elusive. This paper aims to bridge this gap by developing information-theoretic generalization bounds for multi-view learning, with a particular focus on multi-view reconstruction and classification tasks. Our bounds underscore the importance of capturing both consensus and complementary information from multiple different views to achieve maximally disentangled representations. These results also indicate that applying the multi-view information bottleneck regularizer is beneficial for satisfactory generalization performance. Additionally, we derive novel data-dependent bounds under both leave-one-out and supersample settings, yielding computationally tractable and tighter bounds. In the interpolating regime, we further establish the fast-rate bound for multi-view learning, exhibiting a faster convergence rate compared to conventional square-root bounds. Numerical results indicate a strong correlation between the true generalization gap and the derived bounds.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103776"},"PeriodicalIF":15.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random forest of thoughts: Reasoning path fusion for LLM inference in computational social science 思想的随机森林:计算社会科学中LLM推理的推理路径融合
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-29 DOI: 10.1016/j.inffus.2025.103791
Xiaohua Wu , Xiaohui Tao , Wenjie Wu , Jianwei Zhang , Yuefeng Li , Lin Li
{"title":"Random forest of thoughts: Reasoning path fusion for LLM inference in computational social science","authors":"Xiaohua Wu ,&nbsp;Xiaohui Tao ,&nbsp;Wenjie Wu ,&nbsp;Jianwei Zhang ,&nbsp;Yuefeng Li ,&nbsp;Lin Li","doi":"10.1016/j.inffus.2025.103791","DOIUrl":"10.1016/j.inffus.2025.103791","url":null,"abstract":"<div><div>Large language models (LLMs) have demonstrated significant promise for reasoning problems. They are among the leading techniques for context inference, particularly in scenarios with strong sequential dependencies, where earlier inputs dynamically influence subsequent responses. However, existing reasoning paradigms such as X-of-thoughts (XoT) typically rely on unidirectional, left-to-right inference with limited inference paths. This renders them ineffective in handling inherent skip logic and multi-path reasoning, especially for contexts such as a multi-turn social survey. To address this, we propose Random Forest of Thoughts (RFoT), a novel prompting framework grounded in the principles of reasoning path fusion for skip logic. It uses Iterative Chain-of-Thought (ICoT) prompting to generate a diverse set of reasoning thoughts. These thoughts are then assessed using a cooperative contribution evaluator to estimate their contribution. By randomly sampling and fusing the top-<span><math><mi>k</mi></math></span> reasoning thoughts, RFoT simulates uncertain skip logic and constructs a rich forest of plausible thoughts. This enables it to achieve robust multi-path reasoning, where each question sequence formed by the skip logic is treated as an independent reasoning path. RFoT is validated on two classic social problems featuring strong skip logic, using three open-source LLMs and five datasets that have been categorized as structured social surveys and public social media data. Experimental results demonstrate that RFoT significantly enhances inference performance on problems that require complex, non-linear reasoning across both survey and social media data. The transparency and trustworthiness of the results stem from the interpretable fusion of diverse reasoning paths and the principled integration of cooperative evaluation mechanisms.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103791"},"PeriodicalIF":15.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CMF: Prediction refinement via complementary manifold-based multi-model fusion CMF:基于互补流形的多模型融合的预测细化
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-28 DOI: 10.1016/j.inffus.2025.103782
Bocheng Zhao , Wenxing Zhang , Lei Bao , Wucheng Wang , Zhenyu Kong , Qiguang Miao
{"title":"CMF: Prediction refinement via complementary manifold-based multi-model fusion","authors":"Bocheng Zhao ,&nbsp;Wenxing Zhang ,&nbsp;Lei Bao ,&nbsp;Wucheng Wang ,&nbsp;Zhenyu Kong ,&nbsp;Qiguang Miao","doi":"10.1016/j.inffus.2025.103782","DOIUrl":"10.1016/j.inffus.2025.103782","url":null,"abstract":"<div><div>In current research on multi-model fusion, mainstream approaches predominantly focus on the design of fusion algorithms, while often overlooking the filtering or selection of outputs from individual base models prior to fusion. Moreover, most existing fusion methods exhibit a high degree of coupling, which limits their flexibility and adaptability in cross-scene applications. Consequently, once the fusion is completed, the model architecture tends to become fixed, making it difficult to integrate new models or replace outdated components. To address these limitations and achieve effective state-of-the-art (SOTA) breakthroughs in diverse single-label image classification tasks-such as fine-grained recognition or long-tailed distributions-without being constrained by model architecture, this paper proposes a highly generalizable multi-model complementary method. The proposed approach is applicable to single-label multi-class classification tasks in any deep learning domain and has achieved global SOTA performance on multiple image classification benchmarks. It imposes no restrictions on the architecture, parameter settings, or training strategies of the base models, enabling direct integration of existing SOTA models. Furthermore, the fusion process is fully decoupled, ensuring that the independent training of each base model remains unaffected and preserving the inherent advantages of their original training paradigms.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103782"},"PeriodicalIF":15.5,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
JSS-CLIP: Boosting image-to-video transfer learning with JigSaw side network JSS-CLIP:增强图像到视频的学习与拼图侧网络
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-27 DOI: 10.1016/j.inffus.2025.103775
Dan Liu , Zhouli Shen , Ai Peng , Zhiyuan Ma , Jinpeng Mi , Mao Ye , Jianwei Zhang
{"title":"JSS-CLIP: Boosting image-to-video transfer learning with JigSaw side network","authors":"Dan Liu ,&nbsp;Zhouli Shen ,&nbsp;Ai Peng ,&nbsp;Zhiyuan Ma ,&nbsp;Jinpeng Mi ,&nbsp;Mao Ye ,&nbsp;Jianwei Zhang","doi":"10.1016/j.inffus.2025.103775","DOIUrl":"10.1016/j.inffus.2025.103775","url":null,"abstract":"<div><div>Large pre-trained vision-language models, such as CLIP, have achieved remarkable success in computer vision. However, the challenge of extending image-based models to video understanding through effective temporal modeling remains an open problem. Although recent studies have shifted their focus towards image-to-video transfer learning, the majority of existing methods overlook algorithm efficiency when adapting large models to the video domain. In this paper, we propose an innovative JigSaw Side network, JSS-CLIP, aiming to balance the algorithm efficiency and spatiotemporal modeling performance for video action recognition. Specifically, we introduce lightweight side networks attached to the frozen vision model, which avoids the backpropagation through the computationally intensive pre-trained model, thereby significantly reducing computational costs. Additionally, we design an implicit alignment module to guide the generation of hierarchical spatiotemporal JigSaw feature maps. These feature maps encapsulate rich motion information and action cues within videos, facilitating a comprehensive understanding of dynamic content. We conduct extensive experiments on three large-scale action datasets, whose results consistently demonstrate the competitiveness of JSS-CLIP in terms of efficiency and performance. The source code will be released at https://github.com/liarshen/JSS-CLIP.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103775"},"PeriodicalIF":15.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensemble of KalmanNets with innovation-based attention for robust target tracking 基于创新注意力的卡尔曼网络集成鲁棒目标跟踪
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-27 DOI: 10.1016/j.inffus.2025.103777
Marco Mari, Lauro Snidaro
{"title":"Ensemble of KalmanNets with innovation-based attention for robust target tracking","authors":"Marco Mari,&nbsp;Lauro Snidaro","doi":"10.1016/j.inffus.2025.103777","DOIUrl":"10.1016/j.inffus.2025.103777","url":null,"abstract":"<div><div>Model-based tracking algorithms often suffer from significant performance degradation when tracking maneuvering targets, primarily due to inherent uncertainties in target dynamics. To address this limitation, we propose a novel ensemble-based approach that integrates multiple neural-aided Kalman filters, referred to as KalmanNet, within a multiple-model framework, inspired by traditional interacting multiple-model (IMM) filtering techniques. Each KalmanNet instance is specialized in tracking targets governed by a distinct motion model. The ensemble fuses their state estimates using a Recurrent Neural Network (RNN), which learns to adaptively weigh and combine the predictions based on the underlying target dynamics. This fusion mechanism enables the system to model complex motion patterns more effectively and achieves lower estimation bias and variance compared to relying on a single KalmanNet when tracking maneuvering targets, as demonstrated through extensive simulation experiments. Furthermore, we introduce an explainable, innovation-based attention mechanism to enhance the interpretability of our results, inspired by traditional model-based tracking algorithms, that aids the identification of target motion dynamics. Our findings indicate that this attention mechanism improves robustness to sensor noise, out-of-distribution data, and missing measurements. Overall, this innovative approach has the potential to advance state-of-the-art target tracking applications.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103777"},"PeriodicalIF":15.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing information bottlenecks in graph augmented large language models via graph neural summarization 利用图神经摘要解决图增强大型语言模型中的信息瓶颈问题
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-27 DOI: 10.1016/j.inffus.2025.103784
Wooyoung Kim, Wooju Kim
{"title":"Addressing information bottlenecks in graph augmented large language models via graph neural summarization","authors":"Wooyoung Kim,&nbsp;Wooju Kim","doi":"10.1016/j.inffus.2025.103784","DOIUrl":"10.1016/j.inffus.2025.103784","url":null,"abstract":"<div><div>This study investigates the problem of information bottlenecks in graph-level prompting, where compressing all node embeddings into a single vector leads to significant structural information loss. We clarify and systematically analyze this challenge, and propose the Graph Neural Summarizer (GNS), a continuous prompting framework that generates multiple query-aware prompt vectors to better preserve graph structure and improve context relevance. Experiments on ExplaGraphs, SceneGraphs, and WebQSP show that GNS consistently improves performance over strong graph-level prompting baselines. These findings emphasize the importance of addressing information bottlenecks when integrating graph-structured data with large language models. Implementation details and source code are publicly available at <span><span>https://github.com/timothy-coshin/GraphNeuralSummarizer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103784"},"PeriodicalIF":15.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A channel- adaptive and plug-and- play framework for hyperspectral image analysis 用于高光谱图像分析的通道自适应和即插即用框架
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-26 DOI: 10.1016/j.inffus.2025.103770
Taiqin Chen , Hao Sha , Yifeng Wang , Yuan Jiang , Shuai Liu , Zikun Zhou , Ke Chen , Yongbing Zhang
{"title":"A channel- adaptive and plug-and- play framework for hyperspectral image analysis","authors":"Taiqin Chen ,&nbsp;Hao Sha ,&nbsp;Yifeng Wang ,&nbsp;Yuan Jiang ,&nbsp;Shuai Liu ,&nbsp;Zikun Zhou ,&nbsp;Ke Chen ,&nbsp;Yongbing Zhang","doi":"10.1016/j.inffus.2025.103770","DOIUrl":"10.1016/j.inffus.2025.103770","url":null,"abstract":"<div><div>HyperSpectral Image (HSI) reflects rich properties of matter and facilitates distinguishing various objects, demonstrating substantial potential in a wide range of applications, including medical diagnosis and remote sensing. However, HSI exhibits variable number of channels due to the variations in acquisition equipments, which makes existing HSI analytical methods fail to utilize data from multiple equipments. To address this challenge, we first distill HSIs with varying channels into principal and residual components. We then develop a Fusion-Guided Network (FGNet) to transform the two distilled components into fused images with a fixed number of channels and perform channel-adaptive HSI analysis. To enable the fused images to maintain intensity, structure, and texture information in the original HSI, we generate pseudo labels to supervise the fusion. To facilitate the FGNet to extract more representative features, we further design a low-rank attention module (LGAM), leveraging the low-rank prior of HSI that few key information can represent a large amount of data. Moreover, the proposed framework can be applied as a plug-in to existing HSI analysis methods. We conducted extensive experiments on five HSI datasets including medical HSI segmentation task and remote sensing HSI classification task, which demonstrates the proposed method outperforms the state-of-the-art methods. We further experimentally identified that existing works can be seamlessly incorporated with our framework to achieve channel-adaptive ability and boost analytical performance. Code is available at https://github.com/hnsytq/FGNet.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103770"},"PeriodicalIF":15.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Virtual PPG reconstruction from accelerometer data via adaptive denoising and cross-Modal fusion 基于自适应去噪和跨模态融合的加速度计数据虚拟PPG重建
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-26 DOI: 10.1016/j.inffus.2025.103781
Illia Fedorin
{"title":"Virtual PPG reconstruction from accelerometer data via adaptive denoising and cross-Modal fusion","authors":"Illia Fedorin","doi":"10.1016/j.inffus.2025.103781","DOIUrl":"10.1016/j.inffus.2025.103781","url":null,"abstract":"<div><div>Accurate heart rate (HR) monitoring during high-intensity activity is essential for performance optimization and physiological tracking in wearable devices. While photoplethysmography (PPG) remains the standard for HR estimation, it is prone to motion artifacts, power constraints, and temporary signal loss. Accelerometers (ACC), by contrast, offer motion-resilient and energy-efficient sensing, but estimating HR from ACC alone remains a challenging task. In this study, we introduce a cross-modal virtual sensing framework for HR estimation and spectral reconstruction using only ACC signals. The framework includes: (1) a high-fidelity variational autoencoder (VAE) for offline PPG spectrum reconstruction from ACC input, and (2) a lightweight real-time attention-based denoising model for HR prediction. Both models are trained with a fusion-aware loss to enforce alignment between motion-driven and cardiovascular signal features. Experimental results on public and proprietary datasets demonstrate strong performance and generalization under varying sensor configurations and motion conditions. The real-time model achieves 7.0 BPM mean absolute error (MAE) with only 2.6K parameters, making it suitable for embedded deployment. While PPG remains superior under ideal conditions, the proposed system serves as a fallback modality when optical sensing is unreliable or unavailable-enabling gap-filling, post-processing correction, and low-power monitoring. More broadly, this work positions virtual PPG reconstruction as a proof-of-concept for physiological virtual sensing: a paradigm where one modality can be inferred from another, and potentially reversed, supporting robust multimodal inference in real-world mobile health scenarios.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103781"},"PeriodicalIF":15.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Color and texture count alike: An underwater image enhancement method via dual-attention fusion 颜色和纹理计数相似:一种通过双注意融合的水下图像增强方法
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-26 DOI: 10.1016/j.inffus.2025.103780
Guodong Fan , Shuteng Hu , Jingchun Zhou , Min Gan , C. L Phlip Chen
{"title":"Color and texture count alike: An underwater image enhancement method via dual-attention fusion","authors":"Guodong Fan ,&nbsp;Shuteng Hu ,&nbsp;Jingchun Zhou ,&nbsp;Min Gan ,&nbsp;C. L Phlip Chen","doi":"10.1016/j.inffus.2025.103780","DOIUrl":"10.1016/j.inffus.2025.103780","url":null,"abstract":"<div><div>Underwater image enhancement is a highly challenging task, requiring solutions to complex environmental degradation factors such as light attenuation and color cast. Achieving stability in color restoration and precision in texture recovery is key to improving enhancement results. However, existing methods generally lack in-depth modeling of color and texture information and fail to efficiently fuse these two core visual components, significantly limiting the overall performance of the enhancement results. To this end, we propose an innovative Dual-Attention Fusion Net (DuAF) that solves this problem. On a global scale, DuAF introduces explicit semantic consistency constraints to precisely model color features by reconstructing pixel intensity distribution, enhancing sensitivity to color features, and capturing real pixel gradient changes, effectively addressing complex color distortion issues. On a local scale, DuAF dynamically adjusts the perception window, combines optimized attention weights with positional deviations, and deeply models texture information, significantly improving the restoration of texture details. Overall, DuAF significantly improves the stability of color restoration and the clarity of texture details in complex degraded scenes, providing an efficient and comprehensive solution for underwater image enhancement. Our project is publicly available on <span><span>https://github.com/HuShuteng/DuAF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103780"},"PeriodicalIF":15.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalist models in medical image segmentation: A survey and performance comparison with task-specific approaches 医学图像分割中的通才模型:与特定任务方法的调查和性能比较
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-26 DOI: 10.1016/j.inffus.2025.103709
Andrea Moglia , Matteo Leccardi , Matteo Cavicchioli , Alice Maccarini , Marco Marcon , Luca Mainardi , Pietro Cerveri
{"title":"Generalist models in medical image segmentation: A survey and performance comparison with task-specific approaches","authors":"Andrea Moglia ,&nbsp;Matteo Leccardi ,&nbsp;Matteo Cavicchioli ,&nbsp;Alice Maccarini ,&nbsp;Marco Marcon ,&nbsp;Luca Mainardi ,&nbsp;Pietro Cerveri","doi":"10.1016/j.inffus.2025.103709","DOIUrl":"10.1016/j.inffus.2025.103709","url":null,"abstract":"<div><div>Following the successful paradigm shift of large language models, which leverages pre-training on a massive corpus of data and fine-tuning on various downstream tasks, generalist models have made their foray into computer vision. The introduction of the Segment Anything Model (SAM) marked a milestone in the segmentation of natural images, inspiring the design of numerous architectures for medical image segmentation. In this survey, we offer a comprehensive and in-depth investigation of generalist models for medical image segmentation. We begin with an introduction to the fundamental concepts that underpin their development. Then, we provide a taxonomy based on features fusion on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on SAM2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI, physical AI, and clinical translation. We publicly release a database-backed interactive app with all survey data (<span><span>https://hal9000-lab.github.io/GMMIS-Survey/</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103709"},"PeriodicalIF":15.5,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145229532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信