{"title":"UN-SAM: Domain-adaptive self-prompt segmentation for universal nuclei images","authors":"Zhen Chen , Qing Xu , Xinyu Liu , Yixuan Yuan","doi":"10.1016/j.media.2025.103607","DOIUrl":"10.1016/j.media.2025.103607","url":null,"abstract":"<div><div>In digital pathology, precise nuclei segmentation is pivotal yet challenged by the diversity of tissue types, staining protocols, and imaging conditions. Recently, the segment anything model (SAM) revealed overwhelming performance in natural scenarios and impressive adaptation to medical imaging. Despite these advantages, the reliance on labor-intensive manual annotation as segmentation prompts severely hinders their clinical applicability, especially for nuclei image analysis containing massive cells where dense manual prompts are impractical. To overcome the limitations of current SAM methods while retaining the advantages, we propose the domain-adaptive self-prompt SAM framework for Universal Nuclei segmentation (UN-SAM), by providing a fully automated solution with superior performance across different domains. Specifically, to eliminate the labor-intensive requirement of per-nuclei annotations for prompt, we devise a multi-scale Self-Prompt Generation (SPGen) module to revolutionize clinical workflow by automatically generating high-quality mask hints to guide the segmentation tasks. Moreover, to unleash the capability of SAM across a variety of nuclei images, we devise a Domain-adaptive Tuning Encoder (DT-Encoder) to seamlessly harmonize visual features with domain-common and domain-specific knowledge, and further devise a Domain Query-enhanced Decoder (DQ-Decoder) by leveraging learnable domain queries for segmentation decoding in different nuclei domains. Extensive experiments prove that our UN-SAM surpasses state-of-the-arts in nuclei instance and semantic segmentation, especially the generalization capability on unseen nuclei domains. The source code is available at <span><span>https://github.com/CUHK-AIM-Group/UN-SAM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103607"},"PeriodicalIF":10.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haochen Shi , Jiangchang Xu , Haitao Li , Shuanglin Jiang , Chaoyu Lei , Huifang Zhou , Yinwei Li , Xiaojun Chen
{"title":"A novel spatial-temporal image fusion method for augmented reality-based endoscopic surgery","authors":"Haochen Shi , Jiangchang Xu , Haitao Li , Shuanglin Jiang , Chaoyu Lei , Huifang Zhou , Yinwei Li , Xiaojun Chen","doi":"10.1016/j.media.2025.103609","DOIUrl":"10.1016/j.media.2025.103609","url":null,"abstract":"<div><div>Augmented reality (AR) has significant potential to enhance the identification of critical locations during endoscopic surgeries, where accurate endoscope calibration is essential for ensuring the quality of augmented images. In optical-based surgical navigation systems, asynchrony between the optical tracker and the endoscope can cause the augmented scene to diverge from reality during rapid movements, potentially misleading the surgeon—a challenge that remains unresolved. In this paper, we propose a novel spatial–temporal endoscope calibration method that simultaneously determines the spatial transformation from the image to the optical marker and the temporal latency between the tracking and image acquisition systems. To estimate temporal latency, we utilize a Monte Carlo method to estimate the intrinsic parameters of the endoscope’s imaging system, leveraging a dataset of thousands of calibration samples. This dataset is larger than those typically employed in conventional camera calibration routines, rendering traditional algorithms computationally infeasible within a reasonable timeframe. By introducing latency as an independent variable into the principal equation of hand-eye calibration, we developed a weighted algorithm to iteratively solve the equation. This approach eliminates the need for a fixture to stabilize the endoscope during calibration, allowing for quicker calibration through handheld flexible movement. Experimental results demonstrate that our method achieves an average 2D error of <span><math><mrow><mn>7</mn><mo>±</mo><mn>3</mn></mrow></math></span> pixels and a pseudo-3D error of <span><math><mrow><mn>1</mn><mo>.</mo><mn>2</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>4</mn><mspace></mspace><mi>mm</mi></mrow></math></span> for stable scenes within <span><math><mrow><mn>82</mn><mo>.</mo><mn>4</mn><mo>±</mo><mn>16</mn><mo>.</mo><mn>6</mn></mrow></math></span> seconds—approximately 68% faster in operation time than conventional methods. In dynamic scenes, our method compensates for the virtual-to-reality latency of <span><math><mrow><mn>11</mn><mo>±</mo><mn>2</mn><mspace></mspace><mi>ms</mi></mrow></math></span>, which is shorter than a single frame interval and 5.7 times shorter than the uncompensated conventional method. Finally, we successfully integrated the proposed method into our surgical navigation system and validated its feasibility in clinical trials for transnasal optic canal decompression surgery. Our method has the potential to improve the safety and efficacy of endoscopic surgeries, leading to better patient outcomes.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103609"},"PeriodicalIF":10.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143911737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fenghe Tang , Bingkun Nian , Yingtai Li , Zihang Jiang , Jie Yang , Wei Liu , S. Kevin Zhou
{"title":"MambaMIM: Pre-training Mamba with state space token interpolation and its application to medical image segmentation","authors":"Fenghe Tang , Bingkun Nian , Yingtai Li , Zihang Jiang , Jie Yang , Wei Liu , S. Kevin Zhou","doi":"10.1016/j.media.2025.103606","DOIUrl":"10.1016/j.media.2025.103606","url":null,"abstract":"<div><div>Recently, the state space model Mamba has demonstrated efficient long-sequence modeling capabilities, particularly for addressing long-sequence visual tasks in 3D medical imaging. However, existing generative self-supervised learning methods have not yet fully unleashed Mamba’s potential for handling long-range dependencies because they overlook the inherent causal properties of state space sequences in masked modeling. To address this challenge, we propose a general-purpose pre-training framework called MambaMIM, a masked image modeling method based on a novel <strong>TOKen-Interpolation</strong> strategy (TOKI) for the selective structure state space sequence, which learns causal relationships of state space within the masked sequence. Further, MambaMIM introduces a bottom-up 3D hybrid masking strategy to maintain a <strong>masking consistency</strong> across different architectures and can be used on any single or hybrid Mamba architecture to enhance its multi-scale and long-range representation capability. We pre-train MambaMIM on a large-scale dataset of 6.8K CT scans and evaluate its performance across eight public medical segmentation benchmarks. Extensive downstream experiments reveal the feasibility and advancement of using Mamba for medical image pre-training. In particular, when we apply the MambaMIM to a customized architecture that hybridizes MedNeXt and Vision Mamba, we consistently obtain the state-of-the-art segmentation performance. The code is available at: <span><span>https://github.com/FengheTan9/MambaMIM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103606"},"PeriodicalIF":10.7,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bing Yang , Xiaoqing Zhang , Huihong Zhang , Sanqian Li , Risa Higashita , Jiang Liu
{"title":"Structural uncertainty estimation for medical image segmentation","authors":"Bing Yang , Xiaoqing Zhang , Huihong Zhang , Sanqian Li , Risa Higashita , Jiang Liu","doi":"10.1016/j.media.2025.103602","DOIUrl":"10.1016/j.media.2025.103602","url":null,"abstract":"<div><div>Precise segmentation and uncertainty estimation are crucial for error identification and correction in medical diagnostic assistance. Existing methods mainly rely on pixel-wise uncertainty estimations. They (1) neglect the global context, leading to erroneous uncertainty indications, and (2) bring attention interference, resulting in the waste of extensive details and potential understanding confusion. In this paper, we propose a novel structural uncertainty estimation method, based on Convolutional Neural Networks (CNN) and Active Shape Models (ASM), named SU-ASM, which incorporates global shape information for providing precise segmentation and uncertainty estimation. The SU-ASM consists of three components. Firstly, multi-task generation provides multiple outcomes to assist ASM initialization and shape optimization via a multi-task learning module. Secondly, information fusion involves the creation of a Combined Boundary Probability (CBP) and along with a rapid shape initialization algorithm, Key Landmark Template Matching (KLTM), to enhance boundary reliability and select appropriate shape templates. Finally, shape model fitting where multiple shape templates are matched to the CBP while maintaining their intrinsic shape characteristics. Fitted shapes generate segmentation results and structural uncertainty estimations. The SU-ASM has been validated on cardiac ultrasound dataset, ciliary muscle dataset of the anterior eye segment, and the chest X-ray dataset. It outperforms state-of-the-art methods in terms of segmentation and uncertainty estimation.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103602"},"PeriodicalIF":10.7,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143887922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Kalkhof, Niklas Ihm, Tim Köhler, Bjarne Gregori, Anirban Mukhopadhyay
{"title":"MED-NCA: Bio-inspired medical image segmentation","authors":"John Kalkhof, Niklas Ihm, Tim Köhler, Bjarne Gregori, Anirban Mukhopadhyay","doi":"10.1016/j.media.2025.103601","DOIUrl":"10.1016/j.media.2025.103601","url":null,"abstract":"<div><div>The reliance on computationally intensive U-Net and Transformer architectures significantly limits their accessibility in low-resource environments, creating a technological divide that hinders global healthcare equity, especially in medical diagnostics and treatment planning. This divide is most pronounced in low- and middle-income countries, primary care facilities, and conflict zones. We introduced MED-NCA, Neural Cellular Automata (NCA) based segmentation models characterized by their low parameter count, robust performance, and inherent quality control mechanisms. These features drastically lower the barriers to high-quality medical image analysis in resource-constrained settings, allowing the models to run efficiently on hardware as minimal as a Raspberry Pi or a smartphone. Building upon the foundation laid by MED-NCA, this paper extends its validation across eight distinct anatomies, including the hippocampus and prostate (MRI, 3D), liver and spleen (CT, 3D), heart and lung (X-ray, 2D), breast tumor (Ultrasound, 2D), and skin lesion (Image, 2D). Our comprehensive evaluation demonstrates the broad applicability and effectiveness of MED-NCA in various medical imaging contexts, matching the performance of two magnitudes larger UNet models. Additionally, we introduce NCA-VIS, a visualization tool that gives insight into the inference process of MED-NCA and allows users to test its robustness by applying various artifacts. This combination of efficiency, broad applicability, and enhanced interpretability makes MED-NCA a transformative solution for medical image analysis, fostering greater global healthcare equity by making advanced diagnostics accessible in even the most resource-limited environments.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103601"},"PeriodicalIF":10.7,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junxin Chen , Zhiheng Ye , Renlong Zhang , Hao Li , Bo Fang , Li-bo Zhang , Wei Wang
{"title":"Medical image translation with deep learning: Advances, datasets and perspectives","authors":"Junxin Chen , Zhiheng Ye , Renlong Zhang , Hao Li , Bo Fang , Li-bo Zhang , Wei Wang","doi":"10.1016/j.media.2025.103605","DOIUrl":"10.1016/j.media.2025.103605","url":null,"abstract":"<div><div>Traditional medical image generation often lacks patient-specific clinical information, limiting its clinical utility despite enhancing downstream task performance. In contrast, medical image translation precisely converts images from one modality to another, preserving both anatomical structures and cross-modal features, thus enabling efficient and accurate modality transfer and offering unique advantages for model development and clinical practice. This paper reviews the latest advancements in deep learning(DL)-based medical image translation. Initially, it elaborates on the diverse tasks and practical applications of medical image translation. Subsequently, it provides an overview of fundamental models, including convolutional neural networks (CNNs), transformers, and state space models (SSMs). Additionally, it delves into generative models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Autoregressive Models (ARs), diffusion Models, and flow Models. Evaluation metrics for assessing translation quality are discussed, emphasizing their importance. Commonly used datasets in this field are also analyzed, highlighting their unique characteristics and applications. Looking ahead, the paper identifies future trends, challenges, and proposes research directions and solutions in medical image translation. It aims to serve as a valuable reference and inspiration for researchers, driving continued progress and innovation in this area.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103605"},"PeriodicalIF":10.7,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143890509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"General retinal image enhancement via reconstruction: Bridging distribution shifts using latent diffusion adaptors","authors":"Bingyu Yang, Haonan Han, Weihang Zhang, Huiqi Li","doi":"10.1016/j.media.2025.103603","DOIUrl":"10.1016/j.media.2025.103603","url":null,"abstract":"<div><div>Deep learning-based fundus image enhancement has attracted extensive research attention recently, which has shown remarkable effectiveness in improving the visibility of low-quality images. However, these methods are often constrained to specific datasets and degradations, leading to poor generalization capabilities and having challenges in the fine-tuning process. Therefore, a general method for fundus image enhancement is proposed for improved generalizability and flexibility, which decomposes the enhancement task into reconstruction and adaptation phases. In the reconstruction phase, self-supervised training with unpaired data is employed, allowing the utilization of extensive public datasets to improve the generalizability of the model. During the adaptation phase, the model is fine-tuned according to the target datasets and their degradations, utilizing the pre-trained weights from the reconstruction. The proposed method improves the feasibility of latent diffusion models for retinal image enhancement. Adaptation loss and enhancement adaptor are proposed in autoencoders and diffusion networks for fewer paired training data, fewer trainable parameters, and faster convergence compared with training from scratch. The proposed method can be easily fine-tuned and experiments demonstrate the adaptability for different datasets and degradations. Additionally, the reconstruction-adaptation framework can be utilized in different backbones and other modalities, which shows its generality.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103603"},"PeriodicalIF":10.7,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caiwen Jiang , Xiaodan Xing , Yang Nan , Yingying Fang , Sheng Zhang , Simon Walsh , Guang Yang , Dinggang Shen
{"title":"A lung structure and function information-guided residual diffusion model for predicting idiopathic pulmonary fibrosis progression","authors":"Caiwen Jiang , Xiaodan Xing , Yang Nan , Yingying Fang , Sheng Zhang , Simon Walsh , Guang Yang , Dinggang Shen","doi":"10.1016/j.media.2025.103604","DOIUrl":"10.1016/j.media.2025.103604","url":null,"abstract":"<div><div>Idiopathic Pulmonary Fibrosis (IPF) is a progressive lung disease that continuously scars and thickens lung tissue, leading to respiratory difficulties. Timely assessment of IPF progression is essential for developing treatment plans and improving patient survival rates. However, current clinical standards require multiple (usually two) CT scans at certain intervals to assess disease progression. This presents a dilemma: <em>the disease progression is identified only after the disease has already progressed</em>. To address this issue, a feasible solution is to generate the follow-up CT image from the patient’s initial CT image to achieve early prediction of IPF. To this end, we propose a lung structure and function information-guided residual diffusion model. The key components of our model include (1) using a 2.5D generation strategy to reduce computational cost of generating 3D images with the diffusion model; (2) designing structural attention to mitigate negative impact of spatial misalignment between the two CT images on generation performance; (3) employing residual diffusion to accelerate model training and inference while focusing more on differences between the two CT images (i.e., the lesion areas); and (4) developing a CLIP-based text extraction module to extract lung function test information and further using such extracted information to guide the generation. Extensive experiments demonstrate that our method can effectively predict IPF progression and achieve superior generation performance compared to state-of-the-art methods.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103604"},"PeriodicalIF":10.7,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143890511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karthik Gopinath , Douglas N. Greve , Colin Magdamo , Steve Arnold , Sudeshna Das , Oula Puonti , Juan Eugenio Iglesias , Alzheimer’s Disease Neuroimaging Initiative
{"title":"“Recon-all-clinical”: Cortical surface reconstruction and analysis of heterogeneous clinical brain MRI","authors":"Karthik Gopinath , Douglas N. Greve , Colin Magdamo , Steve Arnold , Sudeshna Das , Oula Puonti , Juan Eugenio Iglesias , Alzheimer’s Disease Neuroimaging Initiative","doi":"10.1016/j.media.2025.103608","DOIUrl":"10.1016/j.media.2025.103608","url":null,"abstract":"<div><div>Surface-based analysis of the cerebral cortex is ubiquitous in human neuroimaging with MRI. It is crucial for tasks like cortical registration, parcellation, and thickness estimation. Traditionally, such analyses require high-resolution, isotropic scans with good gray–white matter contrast, typically a T1-weighted scan with 1 mm resolution. This requirement precludes application of these techniques to most MRI scans acquired for clinical purposes, since they are often anisotropic and lack the required T1-weighted contrast. To overcome this limitation and enable large-scale neuroimaging studies using vast amounts of existing clinical data, we introduce <em>recon-all-clinical</em>, a novel methodology for cortical reconstruction, registration, parcellation, and thickness estimation for clinical brain MRI scans of any resolution and contrast. Our approach employs a hybrid analysis method that combines a convolutional neural network (CNN) trained with domain randomization to predict signed distance functions (SDFs), and classical geometry processing for accurate surface placement while maintaining topological and geometric constraints. The method does not require retraining for different acquisitions, thus simplifying the analysis of heterogeneous clinical datasets. We evaluated <em>recon-all-clinical</em> on multiple public datasets like ADNI, HCP, AIBL, OASIS and including a large clinical dataset of over 9,500 scans. The results indicate that our method produces geometrically precise cortical reconstructions across different MRI contrasts and resolutions, consistently achieving high accuracy in parcellation. Cortical thickness estimates are precise enough to capture aging effects, independently of MRI contrast, even though accuracy varies with slice thickness. Our method is publicly available at <span><span>https://surfer.nmr.mgh.harvard.edu/fswiki/recon-all-clinical</span><svg><path></path></svg></span>, enabling researchers to perform detailed cortical analysis on the huge amounts of already existing clinical MRI scans. This advancement may be particularly valuable for studying rare diseases and underrepresented populations where research-grade MRI data is scarce.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103608"},"PeriodicalIF":10.7,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ang Nan Gu , Hooman Vaseli , Michael Y. Tsang , Victoria Wu , S. Neda Ahmadi Amiri , Nima Kondori , Andrea Fung , Teresa S.M. Tsang , Purang Abolmaesumi
{"title":"ProtoASNet: Comprehensive evaluation and enhanced performance with uncertainty estimation for aortic stenosis classification in echocardiography","authors":"Ang Nan Gu , Hooman Vaseli , Michael Y. Tsang , Victoria Wu , S. Neda Ahmadi Amiri , Nima Kondori , Andrea Fung , Teresa S.M. Tsang , Purang Abolmaesumi","doi":"10.1016/j.media.2025.103600","DOIUrl":"10.1016/j.media.2025.103600","url":null,"abstract":"<div><div>Aortic stenosis (AS) is a prevalent heart valve disease that requires accurate and timely diagnosis for effective treatment. Current methods for automated AS severity classification rely on black-box deep learning techniques, which suffer from a low level of trustworthiness and hinder clinical adoption. To tackle this challenge, we propose ProtoASNet, a prototype-based neural network designed to classify the severity of AS from B-mode echocardiography videos. ProtoASNet bases its predictions exclusively on the similarity scores between the input and a set of learned spatio-temporal prototypes, ensuring inherent interpretability. Users can directly visualize the similarity between the input and each prototype, as well as the weighted sum of similarities. This approach provides clinically relevant evidence for each prediction, as the prototypes typically highlight markers such as calcification and restricted movement of aortic valve leaflets. Moreover, ProtoASNet utilizes abstention loss to estimate aleatoric uncertainty by defining a set of prototypes that capture ambiguity and insufficient information in the observed data. This feature augments prototype-based models with the ability to explain when they may fail. We evaluate ProtoASNet on a private dataset and the publicly available TMED-2 dataset. It surpasses existing state-of-the-art methods, achieving a balanced accuracy of 80.0% on our private dataset and 79.7% on the TMED-2 dataset, respectively. By discarding cases flagged as uncertain, ProtoASNet achieves an improved balanced accuracy of 82.4% on our private dataset. Furthermore, by offering interpretability and an uncertainty measure for each prediction, ProtoASNet improves transparency and facilitates the interactive usage of deep networks in aiding clinical decision-making. Our source code is available at: <span><span>https://github.com/hooman007/ProtoASNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103600"},"PeriodicalIF":10.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}