Explainability-informed benchmarking of two deep learning models for organ-at-risk segmentation in MR-guided adaptive radiotherapy

IF 2 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Medical Imaging and Radiation Sciences Pub Date : 2026-05-01 Epub Date: 2026-02-13 DOI:10.1016/j.jmir.2026.102200

H. Sekkat , A. Khallouqi , Y. Hammouga , A. Halimi , O. El mouden , A. Bannan , Y. Berrada , O. El rhazouani

{"title":"Explainability-informed benchmarking of two deep learning models for organ-at-risk segmentation in MR-guided adaptive radiotherapy","authors":"H. Sekkat , A. Khallouqi , Y. Hammouga , A. Halimi , O. El mouden , A. Bannan , Y. Berrada , O. El rhazouani","doi":"10.1016/j.jmir.2026.102200","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction/Background</h3><div>Segmentation of gastrointestinal (GI) organs-at-risk (OARs) is a critical yet time-consuming step in MR-guided adaptive radiotherapy (MRgRT), with manual delineation prone to inter- and intra-observer variability. While deep learning approaches have shown promise, their clinical adoption requires not only accuracy but also interpretability and reliability. This study benchmarks two widely used convolutional architectures, U-Net and Residual U-Net (ResUNet), for abdominal OAR segmentation, with an emphasis on explainability-oriented quantitative analysis.</div></div><div><h3>Methods</h3><div>An anonymized abdominal MRI dataset was used to train and evaluate U-Net and ResUNet using a 5-fold stratified group cross-validation strategy. Segmentation performance was assessed using the Dice Similarity Coefficient (DSC), Intersection-over-Union (IoU), and the 95th percentile Hausdorff Distance (HD95). Explainability was investigated using Gradient-weighted Class Activation Mapping (Grad-CAM) computed from the final convolutional layer of each network. To enable objective analysis beyond qualitative visualization, Grad-CAM activation maps were quantified using numerical localization metrics relative to ground-truth organ masks, including in-organ energy ratio, boundary energy ratio, pointing accuracy, activation Dice coefficient, centroid distance and activation entropy. Grad-CAM metrics were aggregated across gastrointestinal organs and averaged over the five validation folds.</div></div><div><h3>Results</h3><div>Both architectures demonstrated comparable segmentation performance across organs, with no statistically significant differences across evaluated metrics. Grad-CAM analysis showed similar region-level attention patterns, with in-organ activation ratios of 71.4 ± 8.6% for U-Net and 66.2 ± 9.1% for ResUNet, boundary energy ratios of 24.1 ± 4.9% and 21.8 ± 5.2%, respectively, and pointing accuracies exceeding 70% for both models. Uncertainty analysis based on inter-fold variability and boundary error dispersion indicated comparable stability and bounded worst-case behavior.</div></div><div><h3>Discussion/Conclusion</h3><div>By integrating performance, uncertainty and explainability quantitative indicators, this study provides an informed benchmarking of two deep learning models for abdominal OAR segmentation. The results suggest that both U-Net and ResUNet exhibit stable and interpretable behavior under the evaluated configurations, supporting their potential use in MR-guided adaptive radiotherapy workflows where reliability and clinical trust are essential.</div></div>","PeriodicalId":46420,"journal":{"name":"Journal of Medical Imaging and Radiation Sciences","volume":"57 3","pages":"Article 102200"},"PeriodicalIF":2.0000,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging and Radiation Sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1939865426000147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/13 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction/Background

Segmentation of gastrointestinal (GI) organs-at-risk (OARs) is a critical yet time-consuming step in MR-guided adaptive radiotherapy (MRgRT), with manual delineation prone to inter- and intra-observer variability. While deep learning approaches have shown promise, their clinical adoption requires not only accuracy but also interpretability and reliability. This study benchmarks two widely used convolutional architectures, U-Net and Residual U-Net (ResUNet), for abdominal OAR segmentation, with an emphasis on explainability-oriented quantitative analysis.

Methods

An anonymized abdominal MRI dataset was used to train and evaluate U-Net and ResUNet using a 5-fold stratified group cross-validation strategy. Segmentation performance was assessed using the Dice Similarity Coefficient (DSC), Intersection-over-Union (IoU), and the 95th percentile Hausdorff Distance (HD95). Explainability was investigated using Gradient-weighted Class Activation Mapping (Grad-CAM) computed from the final convolutional layer of each network. To enable objective analysis beyond qualitative visualization, Grad-CAM activation maps were quantified using numerical localization metrics relative to ground-truth organ masks, including in-organ energy ratio, boundary energy ratio, pointing accuracy, activation Dice coefficient, centroid distance and activation entropy. Grad-CAM metrics were aggregated across gastrointestinal organs and averaged over the five validation folds.

Results

Both architectures demonstrated comparable segmentation performance across organs, with no statistically significant differences across evaluated metrics. Grad-CAM analysis showed similar region-level attention patterns, with in-organ activation ratios of 71.4 ± 8.6% for U-Net and 66.2 ± 9.1% for ResUNet, boundary energy ratios of 24.1 ± 4.9% and 21.8 ± 5.2%, respectively, and pointing accuracies exceeding 70% for both models. Uncertainty analysis based on inter-fold variability and boundary error dispersion indicated comparable stability and bounded worst-case behavior.

Discussion/Conclusion

By integrating performance, uncertainty and explainability quantitative indicators, this study provides an informed benchmarking of two deep learning models for abdominal OAR segmentation. The results suggest that both U-Net and ResUNet exhibit stable and interpretable behavior under the evaluated configurations, supporting their potential use in MR-guided adaptive radiotherapy workflows where reliability and clinical trust are essential.

查看原文本刊更多论文

磁共振引导的适应性放疗中高危器官分割的两种深度学习模型的可解释性基准测试

在磁共振引导的适应性放疗（MRgRT）中，胃肠道（GI）高危器官（OARs）的分割是一个关键但耗时的步骤，人工划定容易引起观察者之间和观察者内部的差异。虽然深度学习方法已经显示出前景，但它们的临床应用不仅需要准确性，还需要可解释性和可靠性。本研究测试了两种广泛使用的卷积架构，U-Net和残差U-Net (ResUNet)，用于腹部桨叶分割，重点是面向可解释性的定量分析。方法使用匿名腹部MRI数据集，采用5倍分层组交叉验证策略对U-Net和ResUNet进行训练和评估。使用Dice Similarity Coefficient （DSC）、Intersection-over-Union （IoU）和第95百分位Hausdorff Distance （HD95）来评估分割性能。使用从每个网络的最终卷积层计算的梯度加权类激活映射（Grad-CAM）来研究可解释性。为了使定性可视化之外的客观分析成为可能，使用相对于真实器官掩模的数值定位度量对Grad-CAM激活图进行量化，包括器官内能量比、边界能量比、指向精度、激活Dice系数、质心距离和激活熵。Grad-CAM指标在胃肠道器官中汇总，并在五个验证折叠中平均。结果两种架构在不同器官间表现出相当的分割性能，在评估指标之间没有统计学上的显著差异。grada - cam分析显示了相似的区域水平注意模式，U-Net和ResUNet的器官内激活比分别为71.4±8.6%和66.2±9.1%，边界能比分别为24.1±4.9%和21.8±5.2%，两种模型的指向精度均超过70%。基于叠间变异性和边界误差色散的不确定性分析表明，该方法具有相当的稳定性和有界的最坏情况。通过综合性能、不确定性和可解释性定量指标，本研究为腹部桨叶分割的两种深度学习模型提供了知情基准。结果表明，在评估的配置下，U-Net和ResUNet都表现出稳定和可解释的行为，支持它们在可靠性和临床信任至关重要的磁共振引导自适应放疗工作流程中的潜在应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Imaging and Radiation Sciences RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

2.30

自引率

11.10%

发文量

231

审稿时长

53 days

期刊介绍： Journal of Medical Imaging and Radiation Sciences is the official peer-reviewed journal of the Canadian Association of Medical Radiation Technologists. This journal is published four times a year and is circulated to approximately 11,000 medical radiation technologists, libraries and radiology departments throughout Canada, the United States and overseas. The Journal publishes articles on recent research, new technology and techniques, professional practices, technologists viewpoints as well as relevant book reviews.