Automatic contour quality assurance using deep-learning based contours.

IF 3.3 3区医学 Q2 ENGINEERING, BIOMEDICAL

Physics in medicine and biology Pub Date : 2025-06-18 DOI:10.1088/1361-6560/ade5e6

Barbara Marquez, David Fuentes, Christine B Peterson, Dong Joo Rhee, Raphael J Douglas, Raymond P Mumme, Anuja Jhingran, Julianne M Pollard, Surendra Prajapati, Thomas Whitaker, Laurence E Court

{"title":"Automatic contour quality assurance using deep-learning based contours.","authors":"Barbara Marquez, David Fuentes, Christine B Peterson, Dong Joo Rhee, Raphael J Douglas, Raymond P Mumme, Anuja Jhingran, Julianne M Pollard, Surendra Prajapati, Thomas Whitaker, Laurence E Court","doi":"10.1088/1361-6560/ade5e6","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Safe deployment of auto-contouring models requires the inclusion of automated QA. One such approach is to use two independent auto-contouring models and compare them geometrically for acceptability. This is not effective because geometric differences may not correlate with clinically significant errors. Herein, we investigated whether a two-contour QA system is improved by including dose in this comparison.Approach. VMAT plans were generated for 86 head and neck (H&N) and 50 cervical (GYN) cancer patients, using clinically-approved PTVs and auto-contour OARs from a primary auto-contouring model. Doses to the primary OARs were compared with doses to manually drawn and approved OARs (\"the truth\"). A difference in Dmean or Dmax ≥ 2 Gy was identified as a reporting error (Derror). A second, independent auto-contouring model was then used to contour the OARs (verification). The primary and verification auto-contouring models were compared geometrically (DSC, sDSC, HD95, MSD) and dosimetrically (Dmean, Dmax). The ability of comparison metrics between the two auto-contouring models to flag actual dosimetric errors (i.e. primary model compared with the truth) was investigated. A logistic regression model was used to predict Derror. The data was divided by disease site and into 50/50 stratified training and testing sets; k-fold cross validation was employed during training to avoid overfitting. H&N structures were further divided into size-specific groups to improve model performance and generalizability.Main Results. Including dose metrics in the logistic regression model to predict Derror, mean increased the performance in terms of ROC-AUC and AU-PRC in the test set for H&N small structures. For Derror, max, including dose metrics increased performance for H&N small structures, H&N medium structures, and GYN structures. Significance. In many instances, utilizing dose with geometric comparisons can improve the ability of a verification model to flag potential errors from a primary auto-contouring model.&#xD.</p>","PeriodicalId":20185,"journal":{"name":"Physics in medicine and biology","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics in medicine and biology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1088/1361-6560/ade5e6","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Safe deployment of auto-contouring models requires the inclusion of automated QA. One such approach is to use two independent auto-contouring models and compare them geometrically for acceptability. This is not effective because geometric differences may not correlate with clinically significant errors. Herein, we investigated whether a two-contour QA system is improved by including dose in this comparison. Approach. VMAT plans were generated for 86 head and neck (H&N) and 50 cervical (GYN) cancer patients, using clinically-approved PTVs and auto-contour OARs from a primary auto-contouring model. Doses to the primary OARs were compared with doses to manually drawn and approved OARs ("the truth"). A difference in Dmean or Dmax ≥ 2 Gy was identified as a reporting error (Derror). A second, independent auto-contouring model was then used to contour the OARs (verification). The primary and verification auto-contouring models were compared geometrically (DSC, sDSC, HD95, MSD) and dosimetrically (Dmean, Dmax). The ability of comparison metrics between the two auto-contouring models to flag actual dosimetric errors (i.e. primary model compared with the truth) was investigated. A logistic regression model was used to predict Derror. The data was divided by disease site and into 50/50 stratified training and testing sets; k-fold cross validation was employed during training to avoid overfitting. H&N structures were further divided into size-specific groups to improve model performance and generalizability. Main Results. Including dose metrics in the logistic regression model to predict Derror, mean increased the performance in terms of ROC-AUC and AU-PRC in the test set for H&N small structures. For Derror, max, including dose metrics increased performance for H&N small structures, H&N medium structures, and GYN structures. Significance. In many instances, utilizing dose with geometric comparisons can improve the ability of a verification model to flag potential errors from a primary auto-contouring model. .

查看原文本刊更多论文

使用基于深度学习的轮廓自动轮廓质量保证。

目的：自动轮廓模型的安全部署需要包含自动QA。其中一种方法是使用两个独立的自动轮廓模型，并对它们进行几何上的可接受性比较。这是无效的，因为几何差异可能与临床显著误差无关。在此，我们研究了在这种比较方法中加入剂量是否改善了双轮廓QA系统。；使用临床批准的ptv和来自初级自动轮廓模型的自动轮廓OARs，为86例头颈部（H&N）和50例宫颈（GYN）癌症患者生成VMAT计划。将主要桨的剂量与人工绘制和批准的桨的剂量进行比较（“真相”）。Dmean或Dmax≥2 Gy的差异被认定为报告错误（error）。然后使用第二个独立的自动轮廓模型来轮廓桨（验证）。初步和验证的自动轮廓模型进行几何（DSC, sDSC, HD95， MSD）和剂量学（Dmean, Dmax）的比较。研究了两种自动轮廓模型之间的比较指标标记实际剂量学误差（即初级模型与真实模型的比较）的能力。采用逻辑回归模型预测误差。数据按疾病部位分成50/50的分层训练和测试集；训练时采用K-fold交叉验证，避免过拟合。H&N结构进一步划分为特定尺寸组，以提高模型性能和可泛化性。；在逻辑回归模型中加入剂量计量来预测误差，在H&N小结构的测试集中，平均值提高了ROC-AUC和AU-PRC的性能。对于error， max，包括剂量指标提高了H&N小型结构，H&N中型结构和GYN结构的性能。& # xD;意义。在许多情况下，利用剂量与几何比较可以提高验证模型标记初级自动轮廓模型潜在错误的能力。。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Physics in medicine and biology 医学-工程：生物医学

CiteScore

6.50

自引率

14.30%

发文量

409

审稿时长

2 months

期刊介绍： The development and application of theoretical, computational and experimental physics to medicine, physiology and biology. Topics covered are: therapy physics (including ionizing and non-ionizing radiation); biomedical imaging (e.g. x-ray, magnetic resonance, ultrasound, optical and nuclear imaging); image-guided interventions; image reconstruction and analysis (including kinetic modelling); artificial intelligence in biomedical physics and analysis; nanoparticles in imaging and therapy; radiobiology; radiation protection and patient dose monitoring; radiation dosimetry