Towards robust deep learning-based autosegmentation in MRI-planned gynecological brachytherapy: Importance of scalable development and comprehensive evaluation

IF 1.8 4区医学 Q4 ONCOLOGY

Brachytherapy Pub Date : 2026-03-01 Epub Date: 2026-01-21 DOI:10.1016/j.brachy.2025.12.007

Patricia Jule Oliva , Shrimanti Ghosh , Fleur Huang , Ericka Wiebe , Julie Cuartero , Sunita Ghosh , Pierre Boulanger , Jihyun Yun , Kumaradevan Punithakumar , Geetha Menon

{"title":"Towards robust deep learning-based autosegmentation in MRI-planned gynecological brachytherapy: Importance of scalable development and comprehensive evaluation","authors":"Patricia Jule Oliva , Shrimanti Ghosh , Fleur Huang , Ericka Wiebe , Julie Cuartero , Sunita Ghosh , Pierre Boulanger , Jihyun Yun , Kumaradevan Punithakumar , Geetha Menon","doi":"10.1016/j.brachy.2025.12.007","DOIUrl":null,"url":null,"abstract":"<div><h3>PURPOSE</h3><div>To present comprehensive development and evaluation methodologies for a generalizable deep learning (DL)-driven autocontouring model of standard pelvic organs-at-risk (OARs) in MRI-planned cervical brachytherapy.</div></div><div><h3>MATERIALS AND METHODS</h3><div>A curated dataset of 200 3D-MRIs (85% training/validation, 15% testing) including multiple applicator types, varying treated anatomies, and manual contours of OARs (bladder, rectum, sigmoid, small bowel) by 3 physicians was utilized to develop an nnU-Net-based autocontouring model. Iterative tuning was conducted to determine the optimal hyperparameters and enhance evaluation metrics. Model performance was assessed using quantitative metrics, like geometric (e.g., Dice Coefficient (DC) and Hausdorff Distance 95th Percentile (HD95)) and dosimetric (dose-volume histograms (DVHs), dose differences (ΔD2cc)), and then correlated with qualitative physician-review (modified Turing and Likert tests).</div></div><div><h3>RESULTS</h3><div>Geometric metrics were best for bladder (e.g., mean ± SD DC|HD95(mm) 0.93 ± 0.02|2.26 ± 1.07) with greater variability exhibited for small bowel (0.62 ± 0.16|24.90 ± 14.36). Dosimetric comparisons of manual vs predicted contours showed high agreement in DVHs, with mean ΔD2cc <0.60 Gy EQD2<sub>3</sub> across all OARs. Model performance was consistent, irrespective of applicator type, OAR volume, or contourer. Quantitative scores in support of DLM were not always associated with as favorable qualitative results, yet physician-review showed clinical acceptability (80% for bladder and rectum).</div></div><div><h3>CONCLUSION</h3><div>The DL-based autocontouring model, trained on a heterogeneous in-house dataset, demonstrates clinical acceptability for OARs as determined by comprehensive evaluation. It also shows promise for translatability to target contouring, and adaptability to other gynecological (noncervix) brachytherapy applications. Differences in qualitative and quantitative results exist; directionality and magnitude should be considered in clinical usability assessments of brachytherapy autocontouring models.</div></div>","PeriodicalId":55334,"journal":{"name":"Brachytherapy","volume":"25 2","pages":"Pages 361-372"},"PeriodicalIF":1.8000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brachytherapy","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1538472125003770","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/21 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

PURPOSE

To present comprehensive development and evaluation methodologies for a generalizable deep learning (DL)-driven autocontouring model of standard pelvic organs-at-risk (OARs) in MRI-planned cervical brachytherapy.

MATERIALS AND METHODS

A curated dataset of 200 3D-MRIs (85% training/validation, 15% testing) including multiple applicator types, varying treated anatomies, and manual contours of OARs (bladder, rectum, sigmoid, small bowel) by 3 physicians was utilized to develop an nnU-Net-based autocontouring model. Iterative tuning was conducted to determine the optimal hyperparameters and enhance evaluation metrics. Model performance was assessed using quantitative metrics, like geometric (e.g., Dice Coefficient (DC) and Hausdorff Distance 95th Percentile (HD95)) and dosimetric (dose-volume histograms (DVHs), dose differences (ΔD2cc)), and then correlated with qualitative physician-review (modified Turing and Likert tests).

RESULTS

Geometric metrics were best for bladder (e.g., mean ± SD DC|HD95(mm) 0.93 ± 0.02|2.26 ± 1.07) with greater variability exhibited for small bowel (0.62 ± 0.16|24.90 ± 14.36). Dosimetric comparisons of manual vs predicted contours showed high agreement in DVHs, with mean ΔD2cc <0.60 Gy EQD2₃ across all OARs. Model performance was consistent, irrespective of applicator type, OAR volume, or contourer. Quantitative scores in support of DLM were not always associated with as favorable qualitative results, yet physician-review showed clinical acceptability (80% for bladder and rectum).

CONCLUSION

The DL-based autocontouring model, trained on a heterogeneous in-house dataset, demonstrates clinical acceptability for OARs as determined by comprehensive evaluation. It also shows promise for translatability to target contouring, and adaptability to other gynecological (noncervix) brachytherapy applications. Differences in qualitative and quantitative results exist; directionality and magnitude should be considered in clinical usability assessments of brachytherapy autocontouring models.

查看原文本刊更多论文

在mri计划妇科近距离治疗中实现基于深度学习的鲁棒自分割：可扩展发展和综合评估的重要性。

目的：为mri计划的宫颈近距离治疗中标准盆腔危险器官（OARs）的可推广的深度学习（DL）驱动的自动轮廓模型提供全面的开发和评估方法。材料和方法：利用200个3d - mri（85%训练/验证，15%测试）的精心整理的数据集，包括多种涂抹器类型、不同的治疗解剖结构和3位医生的OARs（膀胱、直肠、乙状结肠、小肠）的手动轮廓，来开发基于nnunet的自动轮廓模型。通过迭代调优确定最优超参数，增强评价指标。使用定量指标评估模型性能，如几何指标（如Dice系数（DC）和Hausdorff距离第95百分位（HD95））和剂量学指标（剂量-体积直方图（DVHs），剂量差异（ΔD2cc）），然后与定性医师评价（修改的Turing和Likert检验）相关联。结果：膀胱的几何指标最佳（例如，mean±SD DC|HD95(mm) 0.93±0.02|2.26±1.07），小肠的几何指标差异较大（0.62±0.16|24.90±14.36）。剂量学比较显示，手动和预测轮廓在dvh上的一致性很高，所有桨的平均值为ΔD2cc 3。无论涂抹器类型、桨叶体积或轮廓器如何，模型性能都是一致的。支持DLM的定量评分并不总是与有利的定性结果相关联，但医生审查显示临床可接受性（膀胱和直肠80%）。结论：在异构内部数据集上训练的基于dl的自动轮廓模型，通过综合评估确定了OARs的临床可接受性。它也显示了目标轮廓的可翻译性，以及对其他妇科（非宫颈）近距离治疗应用的适应性。存在定性和定量结果的差异；在近距离治疗自动轮廓模型的临床可用性评估中应考虑方向性和大小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Brachytherapy 医学-核医学

CiteScore

3.40

自引率

21.10%

发文量

119

审稿时长

9.1 weeks

期刊介绍： Brachytherapy is an international and multidisciplinary journal that publishes original peer-reviewed articles and selected reviews on the techniques and clinical applications of interstitial and intracavitary radiation in the management of cancers. Laboratory and experimental research relevant to clinical practice is also included. Related disciplines include medical physics, medical oncology, and radiation oncology and radiology. Brachytherapy publishes technical advances, original articles, reviews, and point/counterpoint on controversial issues. Original articles that address any aspect of brachytherapy are invited. Letters to the Editor-in-Chief are encouraged.