{"title":"Mamba-Sea: A Mamba-Based Framework With Global-to-Local Sequence Augmentation for Generalizable Medical Image Segmentation","authors":"Zihan Cheng;Jintao Guo;Jian Zhang;Lei Qi;Luping Zhou;Yinghuan Shi;Yang Gao","doi":"10.1109/TMI.2025.3564765","DOIUrl":"10.1109/TMI.2025.3564765","url":null,"abstract":"To segment medical images with distribution shifts, domain generalization (DG) has emerged as a promising setting to train models on source domains that can generalize to unseen target domains. Existing DG methods are mainly based on CNN or ViT architectures. Recently, advanced state space models, represented by Mamba, have shown promising results in various supervised medical image segmentation. The success of Mamba is primarily owing to its ability to capture long-range dependencies while keeping linear complexity with input sequence length, making it a promising alternative to CNNs and ViTs. Inspired by the success, in the paper, we explore the potential of the Mamba architecture to address distribution shifts in DG for medical image segmentation. Specifically, we propose a novel Mamba-based framework, Mamba-Sea, incorporating global-to-local sequence augmentation to improve the model’s generalizability under domain shift issues. Our Mamba-Sea introduces a global augmentation mechanism designed to simulate potential variations in appearance across different sites, aiming to suppress the model’s learning of domain-specific information. At the local level, we propose a sequence-wise augmentation along input sequences, which perturbs the style of tokens within random continuous sub-sequences by modeling and resampling style statistics associated with domain shifts. To our best knowledge, Mamba-Sea is the first work to explore the generalization of Mamba for medical image segmentation, providing an advanced and promising Mamba-based architecture with strong robustness to domain shifts. Remarkably, our proposed method is the first to surpass a Dice coefficient of 90% on the Prostate dataset, which exceeds previous SOTA of 88.61%. The code is available at <uri>https://github.com/orange-czh/Mamba-Sea</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 9","pages":"3741-3755"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143893129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Rib Fracture Instance Segmentation and Classification From CT on the RibFrac Challenge","authors":"Jiancheng Yang;Rui Shi;Liang Jin;Xiaoyang Huang;Kaiming Kuang;Donglai Wei;Shixuan Gu;Jianying Liu;Pengfei Liu;Zhizhong Chai;Yongjie Xiao;Hao Chen;Liming Xu;Bang Du;Xiangyi Yan;Hao Tang;Adam Alessio;Gregory Holste;Jiapeng Zhang;Xiaoming Wang;Jianye He;Lixuan Che;Hanspeter Pfister;Ming Li;Bingbing Ni","doi":"10.1109/TMI.2025.3565514","DOIUrl":"10.1109/TMI.2025.3565514","url":null,"abstract":"Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website (<uri>https://ribfrac.grand-challenge.org/</uri>). In addition, we further analyzed the impact of two post-challenge advancements—large-scale pretraining and rib segmentation—based on our internal baseline for rib fracture detection. These findings lay a foundation for future research and development in AI-assisted rib fracture diagnosis.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3410-3427"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143893128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structure Causal Models and LLMs Integration in Medical Visual Question Answering","authors":"Zibo Xu;Qiang Li;Weizhi Nie;Weijie Wang;Anan Liu","doi":"10.1109/TMI.2025.3564320","DOIUrl":"10.1109/TMI.2025.3564320","url":null,"abstract":"Medical Visual Question Answering (MedVQA) aims to answer medical questions according to medical images. However, the complexity of medical data leads to confounders that are difficult to observe, so bias between images and questions is inevitable. Such cross-modal bias makes it challenging to infer medically meaningful answers. In this work, we propose a causal inference framework for the MedVQA task, which effectively eliminates the relative confounding effect between the image and the question to ensure the precision of the question-answering (QA) session. We are the first to introduce a novel causal graph structure that represents the interaction between visual and textual elements, explicitly capturing how different questions influence visual features. During optimization, we apply the mutual information to discover spurious correlations and propose a multi-variable resampling front-door adjustment method to eliminate the relative confounding effect, which aims to align features based on their true causal relevance to the question-answering task. In addition, we also introduce a prompt strategy that combines multiple prompt forms to improve the model’s ability to understand complex medical data and answer accurately. Extensive experiments on three MedVQA datasets demonstrate that 1) our method significantly improves the accuracy of MedVQA, and 2) our method achieves true causal correlations in the face of complex medical data.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3476-3489"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143890058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MSCPT: Few-Shot Whole Slide Image Classification With Multi-Scale and Context-Focused Prompt Tuning","authors":"Minghao Han;Linhao Qu;Dingkang Yang;Xukun Zhang;Xiaoying Wang;Lihua Zhang","doi":"10.1109/TMI.2025.3564976","DOIUrl":"10.1109/TMI.2025.3564976","url":null,"abstract":"Multiple instance learning (MIL) has become a standard paradigm for the weakly supervised classification of whole slide images (WSIs). However, this paradigm relies on using a large number of labeled WSIs for training. The lack of training data and the presence of rare diseases pose significant challenges for these methods. Prompt tuning combined with pre-trained Vision-Language models (VLMs) is an effective solution to the Few-shot Weakly Supervised WSI Classification (FSWC) task. Nevertheless, applying prompt tuning methods designed for natural images to WSIs presents three significant challenges: 1) These methods fail to fully leverage the prior knowledge from the VLM’s text modality; 2) They overlook the essential multi-scale and contextual information in WSIs, leading to suboptimal results; and 3) They lack exploration of instance aggregation methods. To address these problems, we propose a Multi-Scale and Context-focused Prompt Tuning (MSCPT) method for FSWC task. Specifically, MSCPT employs the frozen large language model to generate pathological visual language prior knowledge at multiple scales, guiding hierarchical prompt tuning. Additionally, we design a graph prompt tuning module to learn essential contextual information within WSI, and finally, a non-parametric cross-guided instance aggregation module has been introduced to derive the WSI-level features. Extensive experiments, visualizations, and interpretability analyses were conducted on five datasets and three downstream tasks using three VLMs, demonstrating the strong performance of our MSCPT. All codes have been made publicly accessible at <uri>https://github.com/Hanminghao/MSCPT</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 9","pages":"3756-3769"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143890056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Co-Pseudo Labeling and Active Selection for Fundus Single-Positive Multi-Label Learning","authors":"Tingxin Hu;Weihang Zhang;Jia Guo;Huiqi Li","doi":"10.1109/TMI.2025.3565000","DOIUrl":"10.1109/TMI.2025.3565000","url":null,"abstract":"Due to the difficulty of collecting multi-label annotations for retinal diseases, fundus images are usually annotated with only one label, while they actually have multiple labels. Given that deep learning requires accurate training data, incomplete disease information may lead to unsatisfactory classifiers and even misdiagnosis. To cope with these challenges, we propose a co-pseudo labeling and active selection method for Fundus Single-Positive multi-label learning, named FSP. FSP trains two networks simultaneously to generate pseudo labels through curriculum co-pseudo labeling and active sample selection. The curriculum co-pseudo labeling adjusts the thresholds according to the model’s learning status of each class. Then, the active sample selection maintains confident positive predictions with more precise pseudo labels based on loss modeling. A detailed experimental evaluation is conducted on seven retinal datasets. Comparison experiments show the effectiveness of FSP and its superiority over previous methods. Downstream experiments are also presented to validate the proposed method.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3428-3438"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ge Zhang;Mathis Vert;Mohamed Nouhoum;Esteban Rivera;Nabil Haidour;Anatole Jimenez;Thomas Deffieux;Simon Barral;Pascal Hersen;Sophie Pezet;Claire Rabut;Mikhail G. Shapiro;Mickael Tanter
{"title":"Amplitude-Modulated Singular Value Decomposition for Ultrafast Ultrasound Imaging of Gas Vesicles","authors":"Ge Zhang;Mathis Vert;Mohamed Nouhoum;Esteban Rivera;Nabil Haidour;Anatole Jimenez;Thomas Deffieux;Simon Barral;Pascal Hersen;Sophie Pezet;Claire Rabut;Mikhail G. Shapiro;Mickael Tanter","doi":"10.1109/TMI.2025.3565023","DOIUrl":"10.1109/TMI.2025.3565023","url":null,"abstract":"Ultrasound imaging holds significant promise for the observation of molecular and cellular phenomena through the utilization of acoustic contrast agents and acoustic reporter genes. Optimizing imaging methodologies for enhanced detection represents an imperative advancement in this field. Most advanced techniques relying on amplitude modulation schemes such as cross amplitude modulation (xAM) and ultrafast amplitude modulation (uAM) combined with Hadamard encoded multiplane wave transmissions have shown efficacy in capturing the acoustic signals of gas vesicles (GVs). Nonetheless, uAM sequence requires odd- or even-element transmissions leading to imprecise amplitude modulation emitting scheme, and the complex multiplane wave transmission scheme inherently yields overlong pulse durations. xAM sequence is limited in terms of field of view and imaging depth. To overcome these limitations, we introduce an innovative ultrafast imaging sequence called amplitude-modulated singular value decomposition (SVD) processing. Our method demonstrates a contrast imaging sensitivity comparable to the current gold-standard xAM and uAM, while requiring 4.8 times fewer pulse transmissions. With a similar number of transmit pulses, amplitude-modulated SVD outperforms xAM and uAM in terms of an improvement in signal-to-background ratio of <inline-formula> <tex-math>$+ 4.78~pm ~0.35$ </tex-math></inline-formula> dB and <inline-formula> <tex-math>$+ 8.29~pm ~3.52$ </tex-math></inline-formula> dB, respectively. Furthermore, the method exhibits superior robustness across a wide range of acoustic pressures and enables high-contrast imaging in ex vivo and in vivo settings. Furthermore, amplitude-modulated SVD is envisioned to be applicable for the detection of slow moving microbubbles in ultrasound localization microscopy (ULM).","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3490-3501"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Tang;Xiaoxiao Yan;Xiaobin Hu;Kai Wu;Tobias Lasser;Kuangyu Shi
{"title":"Anomaly Detection in Medical Images Using Encoder-Attention-2Decoders Reconstruction","authors":"Peng Tang;Xiaoxiao Yan;Xiaobin Hu;Kai Wu;Tobias Lasser;Kuangyu Shi","doi":"10.1109/TMI.2025.3563482","DOIUrl":"10.1109/TMI.2025.3563482","url":null,"abstract":"Anomaly detection (AD) in medical applications is a promising field, offering a cost-effective alternative to labor-intensive abnormal data collection and labeling. However, the success of feature reconstruction-based methods in AD is often hindered by two critical factors: the domain gap of pre-trained encoders and the exploration of decoder potential. The EA2D method we propose overcomes these challenges, paving the way for more effective AD in medical imaging. In this paper, we present encoder-attention-2decoder (EA2D), a novel method tailored for medical AD. Firstly, EA2D is optimized through two tasks: a primary feature reconstruction task between the encoder and decoder, which detects anomalies based on reconstruction errors, and an auxiliary transformation-consistency contrastive learning task that explicitly optimizes the encoder to reduce the domain gap between natural images and medical images. Furthermore, EA2D intensely exploits the decoder’s capabilities to improve AD performance. We introduce a self-attention skip connection to augment the reconstruction quality of normal cases, thereby magnifying the distinction between normal and abnormal samples. Additionally, we propose using dual decoders to reconstruct dual views of an image, leveraging diverse perspectives while mitigating the over-reconstruction issue of anomalies in AD. Extensive experiments across four medical image modalities demonstrates the superiority of our EA2D in various medical scenarios. Our method’s code will be released at <uri>https://github.com/TumCCC/E2AD</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3370-3382"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis","authors":"Jiaxin Zhuang;Linshan Wu;Qiong Wang;Peng Fei;Varut Vardhanabhuti;Lin Luo;Hao Chen","doi":"10.1109/TMI.2025.3564382","DOIUrl":"10.1109/TMI.2025.3564382","url":null,"abstract":"The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Masked AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the performance of downstream tasks. In this paper, we propose a novel Mask in Mask (MiM) pre-training framework for 3D medical images, which aims to advance MAE by learning discriminative representation from hierarchical visual tokens across varying scales. We introduce multiple levels of granularity for masked inputs from the volume, which are then reconstructed simultaneously ranging at both fine and coarse levels. Additionally, a cross-level alignment mechanism is applied to adjacent level volumes to enforce anatomical similarity hierarchically. Furthermore, we adopt a hybrid backbone to enhance the hierarchical representation learning efficiently during the pre-training. MiM was pre-trained on a large scale of available 3D volumetric images, i.e., Computed Tomography (CT) images containing various body parts. Extensive experiments on twelve public datasets demonstrate the superiority of MiM over other SSL methods in organ/tumor segmentation and disease classification. We further scale up the MiM to large pre-training datasets with more than 10k volumes, showing that large-scale pre-training can further enhance the performance of downstream tasks. Code is available at <uri>https://github.com/JiaxinZhuang/MiM</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 9","pages":"3727-3740"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Feng;Jinzhu Yang;Lingzhi Tang;Song Sun;Yonghuai Wang
{"title":"Uncertainty Quantification and Quality Control for Heatmap-Based Landmark Detection Models","authors":"Yong Feng;Jinzhu Yang;Lingzhi Tang;Song Sun;Yonghuai Wang","doi":"10.1109/TMI.2025.3564267","DOIUrl":"10.1109/TMI.2025.3564267","url":null,"abstract":"Uncertainty quantification is a vital aspect of explainable artificial intelligence that fosters clinician trust in medical applications and facilitates timely interventions, leading to safer and more reliable outcomes. Although deep learning models have reached clinically acceptable accuracy in anatomical landmark detection, their predictions remain susceptible to contextual noise due to the small size of the target structures, making uncertainty quantification more challenging than in classification and segmentation tasks. This paper presents an end-to-end uncertainty quantification method tailored for heatmap-based anatomical landmark detection models, designed to improve both interpretability and controllability in clinical applications. Leveraging Dempster-Shafer Theory and Subjective Logic Theory, we implement probability assignment and uncertainty quantification through a single forward pass to ensure computational efficiency. We introduce an evidence map that captures the strength of landmark evidence, alongside an uncertainty map that calibrates predicted probabilities within the Subjective Logic framework. The interaction between these two components, facilitated by a cross-attention mechanism, further improves landmark detection accuracy and enhances the effectiveness of uncertainty quantification. Experimental results demonstrate that the proposed method maintains detection accuracy, even in noisy environments, while outperforming state-of-the-art methods in terms of uncertainty quantification and quality control. Furthermore, the model effectively identifies out-of-distribution data solely through calibrated probabilities when encountering inconsistencies in multi-center data and novel data, underscoring its potential for clinical applications. The source code is available at github.com/warmestwind/CalibratedSL","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3451-3463"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kapil Gangwar;Robert S. C. Winter;Fatemeh Modares Sabzevari;Gary C.-Y. Chen;Kevin K.-M. Chan;Karumudi Rambabu
{"title":"Monitoring Knee Health: Ultra-Wideband Radar Imaging for Early Detection of Osteoarthritis","authors":"Kapil Gangwar;Robert S. C. Winter;Fatemeh Modares Sabzevari;Gary C.-Y. Chen;Kevin K.-M. Chan;Karumudi Rambabu","doi":"10.1109/TMI.2025.3564521","DOIUrl":"10.1109/TMI.2025.3564521","url":null,"abstract":"This paper presents a non-invasive method and study for analyzing knee osteoarthritis, encompassing a dual-step approach: a) the employment of synthetic aperture radar (SAR)-based microwave reflection tomography for imaging the knee joint, and b) the application of an ultra-wideband (UWB) radar technique combined with a genetic algorithm to determine muscle electrical properties (permittivity) and the gap between the femur (thighbone) and tibia (shinbone). The assessment of osteoarthritis is conducted by integrating the outcomes of the knee joint imaging, change in muscle permittivity, and inter-bone spacing. This technique undergoes initial validation on simplified knee models, subsequently extending to adult human voxel knee tissues as represented in CST software. Experimental validation involves analyzing a porcine knee joint comprising sequential layers of skin, fat, muscle, and bone. Both simulated and experimental validations suggest that this technique is viable, safe, and cost-effective for estimating knee osteoarthritis in humans.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3464-3475"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}