Geometric and Dosimetric Evaluation of a RayStation Deep Learning Model for Auto-Segmentation of Organs at Risk in a Real-World Head and Neck Cancer Dataset
D. Sharma , G. Singh , N. Burela , S. Gayen , G. Aishwarya , S. Nangia
{"title":"Geometric and Dosimetric Evaluation of a RayStation Deep Learning Model for Auto-Segmentation of Organs at Risk in a Real-World Head and Neck Cancer Dataset","authors":"D. Sharma , G. Singh , N. Burela , S. Gayen , G. Aishwarya , S. Nangia","doi":"10.1016/j.clon.2025.103796","DOIUrl":null,"url":null,"abstract":"<div><h3><em>Aims</em></h3><div>To assess geometric accuracy and dosimetric impact of a deep learning segmentation (DLS) model on a large, diverse dataset of head and neck cancer (HNC) patients treated with intensity-modulated proton therapy (IMPT).</div></div><div><h3><em>Materials and methods</em></h3><div>A 3D U-Net-based DLS model was applied to CT datasets of 124 HNC patients treated with IMPT at 50.4–70.0 GyRBE. Thirty organs-at-risk (OARs), delineated manually (GT-OARs) were analysed for similarity metrics with auto-segmented OARs, without (DLS-nonedited) and with (DLS-edited) manual correction, using volume, Dice similarity coefficient (DSC), and Hausdorff distance (HD). Dosimetric impact of auto-segmentation error was assessed as absolute dose difference of mean (ΔDmean) and maximum (ΔDmax).</div></div><div><h3><em>Results</em></h3><div>The cohort includes patients with postoperative (47.6%), flap reconstruction (12.1%), mouth bites (79.8%), dental implants (54.8%), and surgical implants (3.2%). DLS failed in 11 patients with significant anatomical challenges and artifact. Compared with GT-OARs, DLS-nonedited under-segmented 11/12 Gr-A (central nervous system, arteries, bone) (p < 0.05) and over-segmented 13/18 Gr-B (glandular, digestive, airways) OARs. DSC scores were good (>0.8), intermediate (0.6–0.8), intermediate–poor (0.5–0.6), and poor (<0.5) in 12, 6, 4, and 8 OARs. HD were good (<4mm), intermediate (4–6mm), poor (6–8mm), and very poor (>8mm) in 5, 7, 4, and 14 OARs. Compared with manually corrected, DLS-edited OARs, all DLS-nonedited OARs demonstrated excellent similarity with DSC>0.8 and HD<4mm. On average, auto-segmentation took 2.51 minutes, while correction took 6.24 minutes. The mean values of ΔDmean and ΔDmax were within ±300 and ±3 cGyRBE, except for oesophagus and larynx, where the mean ΔDmean increases up to 837.14 cGyRBE.</div></div><div><h3><em>Conclusion</em></h3><div>Patient posture, nonbiological materials, and anatomical deformities influence DLS accuracy. The model’s overall performance is adequate and efficient with skilled manual editing needed for few OARs.</div></div>","PeriodicalId":10403,"journal":{"name":"Clinical oncology","volume":"41 ","pages":"Article 103796"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical oncology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0936655525000512","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Aims
To assess geometric accuracy and dosimetric impact of a deep learning segmentation (DLS) model on a large, diverse dataset of head and neck cancer (HNC) patients treated with intensity-modulated proton therapy (IMPT).
Materials and methods
A 3D U-Net-based DLS model was applied to CT datasets of 124 HNC patients treated with IMPT at 50.4–70.0 GyRBE. Thirty organs-at-risk (OARs), delineated manually (GT-OARs) were analysed for similarity metrics with auto-segmented OARs, without (DLS-nonedited) and with (DLS-edited) manual correction, using volume, Dice similarity coefficient (DSC), and Hausdorff distance (HD). Dosimetric impact of auto-segmentation error was assessed as absolute dose difference of mean (ΔDmean) and maximum (ΔDmax).
Results
The cohort includes patients with postoperative (47.6%), flap reconstruction (12.1%), mouth bites (79.8%), dental implants (54.8%), and surgical implants (3.2%). DLS failed in 11 patients with significant anatomical challenges and artifact. Compared with GT-OARs, DLS-nonedited under-segmented 11/12 Gr-A (central nervous system, arteries, bone) (p < 0.05) and over-segmented 13/18 Gr-B (glandular, digestive, airways) OARs. DSC scores were good (>0.8), intermediate (0.6–0.8), intermediate–poor (0.5–0.6), and poor (<0.5) in 12, 6, 4, and 8 OARs. HD were good (<4mm), intermediate (4–6mm), poor (6–8mm), and very poor (>8mm) in 5, 7, 4, and 14 OARs. Compared with manually corrected, DLS-edited OARs, all DLS-nonedited OARs demonstrated excellent similarity with DSC>0.8 and HD<4mm. On average, auto-segmentation took 2.51 minutes, while correction took 6.24 minutes. The mean values of ΔDmean and ΔDmax were within ±300 and ±3 cGyRBE, except for oesophagus and larynx, where the mean ΔDmean increases up to 837.14 cGyRBE.
Conclusion
Patient posture, nonbiological materials, and anatomical deformities influence DLS accuracy. The model’s overall performance is adequate and efficient with skilled manual editing needed for few OARs.
期刊介绍:
Clinical Oncology is an International cancer journal covering all aspects of the clinical management of cancer patients, reflecting a multidisciplinary approach to therapy. Papers, editorials and reviews are published on all types of malignant disease embracing, pathology, diagnosis and treatment, including radiotherapy, chemotherapy, surgery, combined modality treatment and palliative care. Research and review papers covering epidemiology, radiobiology, radiation physics, tumour biology, and immunology are also published, together with letters to the editor, case reports and book reviews.