H. Sekkat , A. Khallouqi , Y. Hammouga , A. Halimi , O. El mouden , A. Bannan , Y. Berrada , O. El rhazouani
{"title":"Explainability-informed benchmarking of two deep learning models for organ-at-risk segmentation in MR-guided adaptive radiotherapy","authors":"H. Sekkat , A. Khallouqi , Y. Hammouga , A. Halimi , O. El mouden , A. Bannan , Y. Berrada , O. El rhazouani","doi":"10.1016/j.jmir.2026.102200","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction/Background</h3><div>Segmentation of gastrointestinal (GI) organs-at-risk (OARs) is a critical yet time-consuming step in MR-guided adaptive radiotherapy (MRgRT), with manual delineation prone to inter- and intra-observer variability. While deep learning approaches have shown promise, their clinical adoption requires not only accuracy but also interpretability and reliability. This study benchmarks two widely used convolutional architectures, U-Net and Residual U-Net (ResUNet), for abdominal OAR segmentation, with an emphasis on explainability-oriented quantitative analysis.</div></div><div><h3>Methods</h3><div>An anonymized abdominal MRI dataset was used to train and evaluate U-Net and ResUNet using a 5-fold stratified group cross-validation strategy. Segmentation performance was assessed using the Dice Similarity Coefficient (DSC), Intersection-over-Union (IoU), and the 95th percentile Hausdorff Distance (HD95). Explainability was investigated using Gradient-weighted Class Activation Mapping (Grad-CAM) computed from the final convolutional layer of each network. To enable objective analysis beyond qualitative visualization, Grad-CAM activation maps were quantified using numerical localization metrics relative to ground-truth organ masks, including in-organ energy ratio, boundary energy ratio, pointing accuracy, activation Dice coefficient, centroid distance and activation entropy. Grad-CAM metrics were aggregated across gastrointestinal organs and averaged over the five validation folds.</div></div><div><h3>Results</h3><div>Both architectures demonstrated comparable segmentation performance across organs, with no statistically significant differences across evaluated metrics. Grad-CAM analysis showed similar region-level attention patterns, with in-organ activation ratios of 71.4 ± 8.6% for U-Net and 66.2 ± 9.1% for ResUNet, boundary energy ratios of 24.1 ± 4.9% and 21.8 ± 5.2%, respectively, and pointing accuracies exceeding 70% for both models. Uncertainty analysis based on inter-fold variability and boundary error dispersion indicated comparable stability and bounded worst-case behavior.</div></div><div><h3>Discussion/Conclusion</h3><div>By integrating performance, uncertainty and explainability quantitative indicators, this study provides an informed benchmarking of two deep learning models for abdominal OAR segmentation. The results suggest that both U-Net and ResUNet exhibit stable and interpretable behavior under the evaluated configurations, supporting their potential use in MR-guided adaptive radiotherapy workflows where reliability and clinical trust are essential.</div></div>","PeriodicalId":46420,"journal":{"name":"Journal of Medical Imaging and Radiation Sciences","volume":"57 3","pages":"Article 102200"},"PeriodicalIF":2.0000,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging and Radiation Sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1939865426000147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/13 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction/Background
Segmentation of gastrointestinal (GI) organs-at-risk (OARs) is a critical yet time-consuming step in MR-guided adaptive radiotherapy (MRgRT), with manual delineation prone to inter- and intra-observer variability. While deep learning approaches have shown promise, their clinical adoption requires not only accuracy but also interpretability and reliability. This study benchmarks two widely used convolutional architectures, U-Net and Residual U-Net (ResUNet), for abdominal OAR segmentation, with an emphasis on explainability-oriented quantitative analysis.
Methods
An anonymized abdominal MRI dataset was used to train and evaluate U-Net and ResUNet using a 5-fold stratified group cross-validation strategy. Segmentation performance was assessed using the Dice Similarity Coefficient (DSC), Intersection-over-Union (IoU), and the 95th percentile Hausdorff Distance (HD95). Explainability was investigated using Gradient-weighted Class Activation Mapping (Grad-CAM) computed from the final convolutional layer of each network. To enable objective analysis beyond qualitative visualization, Grad-CAM activation maps were quantified using numerical localization metrics relative to ground-truth organ masks, including in-organ energy ratio, boundary energy ratio, pointing accuracy, activation Dice coefficient, centroid distance and activation entropy. Grad-CAM metrics were aggregated across gastrointestinal organs and averaged over the five validation folds.
Results
Both architectures demonstrated comparable segmentation performance across organs, with no statistically significant differences across evaluated metrics. Grad-CAM analysis showed similar region-level attention patterns, with in-organ activation ratios of 71.4 ± 8.6% for U-Net and 66.2 ± 9.1% for ResUNet, boundary energy ratios of 24.1 ± 4.9% and 21.8 ± 5.2%, respectively, and pointing accuracies exceeding 70% for both models. Uncertainty analysis based on inter-fold variability and boundary error dispersion indicated comparable stability and bounded worst-case behavior.
Discussion/Conclusion
By integrating performance, uncertainty and explainability quantitative indicators, this study provides an informed benchmarking of two deep learning models for abdominal OAR segmentation. The results suggest that both U-Net and ResUNet exhibit stable and interpretable behavior under the evaluated configurations, supporting their potential use in MR-guided adaptive radiotherapy workflows where reliability and clinical trust are essential.
期刊介绍:
Journal of Medical Imaging and Radiation Sciences is the official peer-reviewed journal of the Canadian Association of Medical Radiation Technologists. This journal is published four times a year and is circulated to approximately 11,000 medical radiation technologists, libraries and radiology departments throughout Canada, the United States and overseas. The Journal publishes articles on recent research, new technology and techniques, professional practices, technologists viewpoints as well as relevant book reviews.