Uncertainty-aware segmentation quality prediction via deep learning Bayesian Modeling: Comprehensive evaluation and interpretation on skin cancer and liver segmentation

IF 5.4 2区医学 Q1 ENGINEERING, BIOMEDICAL

Computerized Medical Imaging and Graphics Pub Date : 2025-04-13 DOI:10.1016/j.compmedimag.2025.102547

Sikha O.K. , Meritxell Riera-Marín , Adrian Galdran , Javier García López , Júlia Rodríguez-Comas , Gemma Piella , Miguel A. González Ballester

{"title":"Uncertainty-aware segmentation quality prediction via deep learning Bayesian Modeling: Comprehensive evaluation and interpretation on skin cancer and liver segmentation","authors":"Sikha O.K. , Meritxell Riera-Marín , Adrian Galdran , Javier García López , Júlia Rodríguez-Comas , Gemma Piella , Miguel A. González Ballester","doi":"10.1016/j.compmedimag.2025.102547","DOIUrl":null,"url":null,"abstract":"<div><div>Image segmentation is a critical step in computational biomedical image analysis, typically evaluated using metrics like the Dice coefficient during training and validation. However, in clinical settings without manual annotations, assessing segmentation quality becomes challenging, and models lacking reliability indicators face adoption barriers. To address this gap, we propose a novel framework for predicting segmentation quality without requiring ground truth annotations during test time. Our approach introduces two complementary frameworks: one leveraging predicted segmentation and uncertainty maps, and another integrating the original input image, uncertainty maps, and predicted segmentation maps. We present Bayesian adaptations of two benchmark segmentation models—SwinUNet and Feature Pyramid Network with ResNet50—using Monte Carlo Dropout, Ensemble, and Test Time Augmentation to quantify uncertainty. We evaluate four uncertainty estimates—confidence map, entropy, mutual information, and expected pairwise Kullback–Leibler divergence—on 2D skin lesion and 3D liver segmentation datasets, analyzing their correlation with segmentation quality metrics. Our framework achieves an R<sup>2</sup> score of 93.25 and Pearson correlation of 96.58 on the HAM10000 dataset, outperforming previous segmentation quality assessment methods. For 3D liver segmentation, Test Time Augmentation with entropy achieves an R<sup>2</sup> score of 85.03 and a Pearson correlation of 65.02, demonstrating cross-modality robustness. Additionally, we propose an aggregation strategy that combines multiple uncertainty estimates into a single score per image, offering a more robust and comprehensive assessment of segmentation quality compared to evaluating each measure independently. The proposed uncertainty-aware segmentation quality prediction network is interpreted using gradient-based methods such as Grad-CAM and feature embedding analysis through UMAP. These techniques provide insights into the model’s behavior and reliability, helping to assess the impact of incorporating uncertainty into the segmentation quality prediction pipeline. The code is available at: <span><span>https://github.com/sikha2552/Uncertainty-Aware-Segmentation-Quality-Prediction-Bayesian-Modeling-with-Comprehensive-Evaluation-</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"123 ","pages":"Article 102547"},"PeriodicalIF":5.4000,"publicationDate":"2025-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895611125000564","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Image segmentation is a critical step in computational biomedical image analysis, typically evaluated using metrics like the Dice coefficient during training and validation. However, in clinical settings without manual annotations, assessing segmentation quality becomes challenging, and models lacking reliability indicators face adoption barriers. To address this gap, we propose a novel framework for predicting segmentation quality without requiring ground truth annotations during test time. Our approach introduces two complementary frameworks: one leveraging predicted segmentation and uncertainty maps, and another integrating the original input image, uncertainty maps, and predicted segmentation maps. We present Bayesian adaptations of two benchmark segmentation models—SwinUNet and Feature Pyramid Network with ResNet50—using Monte Carlo Dropout, Ensemble, and Test Time Augmentation to quantify uncertainty. We evaluate four uncertainty estimates—confidence map, entropy, mutual information, and expected pairwise Kullback–Leibler divergence—on 2D skin lesion and 3D liver segmentation datasets, analyzing their correlation with segmentation quality metrics. Our framework achieves an R² score of 93.25 and Pearson correlation of 96.58 on the HAM10000 dataset, outperforming previous segmentation quality assessment methods. For 3D liver segmentation, Test Time Augmentation with entropy achieves an R² score of 85.03 and a Pearson correlation of 65.02, demonstrating cross-modality robustness. Additionally, we propose an aggregation strategy that combines multiple uncertainty estimates into a single score per image, offering a more robust and comprehensive assessment of segmentation quality compared to evaluating each measure independently. The proposed uncertainty-aware segmentation quality prediction network is interpreted using gradient-based methods such as Grad-CAM and feature embedding analysis through UMAP. These techniques provide insights into the model’s behavior and reliability, helping to assess the impact of incorporating uncertainty into the segmentation quality prediction pipeline. The code is available at: https://github.com/sikha2552/Uncertainty-Aware-Segmentation-Quality-Prediction-Bayesian-Modeling-with-Comprehensive-Evaluation-.

查看原文本刊更多论文

基于深度学习贝叶斯建模的不确定性感知分割质量预测：对皮肤癌和肝脏分割的综合评价与解释

图像分割是计算生物医学图像分析的关键步骤，通常在训练和验证期间使用Dice系数等指标进行评估。然而，在没有人工标注的临床环境中，评估分割质量变得具有挑战性，缺乏可靠性指标的模型面临采用障碍。为了解决这一差距，我们提出了一种新的框架来预测分割质量，而不需要在测试期间进行基础真值注释。我们的方法引入了两个互补的框架：一个利用预测分割和不确定性映射，另一个集成原始输入图像、不确定性映射和预测分割映射。我们提出了两种基准分割模型的贝叶斯适应性- swinunet和带有resnet50的特征金字塔网络-使用蒙特卡罗Dropout， Ensemble和测试时间增强来量化不确定性。我们在2D皮肤病变和3D肝脏分割数据集上评估了四种不确定性估计——置信度图、熵、互信息和预期成对Kullback-Leibler散度，并分析了它们与分割质量指标的相关性。我们的框架在HAM10000数据集上的R2得分为93.25，Pearson相关性为96.58，优于以往的分割质量评估方法。对于三维肝脏分割，基于熵的Test Time Augmentation的R2得分为85.03，Pearson相关系数为65.02，具有跨模态鲁棒性。此外，我们提出了一种聚合策略，将多个不确定性估计结合到每个图像的单个分数中，与独立评估每个测量相比，提供了更强大和全面的分割质量评估。本文提出的不确定性感知分割质量预测网络采用基于梯度的方法（如Grad-CAM）和基于UMAP的特征嵌入分析进行解释。这些技术提供了对模型行为和可靠性的洞察，有助于评估将不确定性纳入分割质量预测管道的影响。代码可从https://github.com/sikha2552/Uncertainty-Aware-Segmentation-Quality-Prediction-Bayesian-Modeling-with-Comprehensive-Evaluation-获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computerized Medical Imaging and Graphics 医学-核医学

CiteScore

10.70

自引率

3.50%

发文量

审稿时长

26 days

期刊介绍： The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.