Barbara Marquez, David Fuentes, Christine B Peterson, Dong Joo Rhee, Raphael J Douglas, Raymond P Mumme, Anuja Jhingran, Julianne M Pollard, Surendra Prajapati, Thomas Whitaker, Laurence E Court
{"title":"Automatic contour quality assurance using deep-learning based contours.","authors":"Barbara Marquez, David Fuentes, Christine B Peterson, Dong Joo Rhee, Raphael J Douglas, Raymond P Mumme, Anuja Jhingran, Julianne M Pollard, Surendra Prajapati, Thomas Whitaker, Laurence E Court","doi":"10.1088/1361-6560/ade5e6","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Safe deployment of auto-contouring models requires the inclusion of automated QA. One such approach is to use two independent auto-contouring models and compare them geometrically for acceptability. This is not effective because geometric differences may not correlate with clinically significant errors. Herein, we investigated whether a two-contour QA system is improved by including dose in this comparison.
Approach. VMAT plans were generated for 86 head and neck (H&N) and 50 cervical (GYN) cancer patients, using clinically-approved PTVs and auto-contour OARs from a primary auto-contouring model. Doses to the primary OARs were compared with doses to manually drawn and approved OARs (\"the truth\"). A difference in Dmean or Dmax ≥ 2 Gy was identified as a reporting error (Derror). A second, independent auto-contouring model was then used to contour the OARs (verification). The primary and verification auto-contouring models were compared geometrically (DSC, sDSC, HD95, MSD) and dosimetrically (Dmean, Dmax). The ability of comparison metrics between the two auto-contouring models to flag actual dosimetric errors (i.e. primary model compared with the truth) was investigated. A logistic regression model was used to predict Derror. The data was divided by disease site and into 50/50 stratified training and testing sets; k-fold cross validation was employed during training to avoid overfitting. H&N structures were further divided into size-specific groups to improve model performance and generalizability.
Main Results. Including dose metrics in the logistic regression model to predict Derror, mean increased the performance in terms of ROC-AUC and AU-PRC in the test set for H&N small structures. For Derror, max, including dose metrics increased performance for H&N small structures, H&N medium structures, and GYN structures. 
Significance. In many instances, utilizing dose with geometric comparisons can improve the ability of a verification model to flag potential errors from a primary auto-contouring model.
.</p>","PeriodicalId":20185,"journal":{"name":"Physics in medicine and biology","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics in medicine and biology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1088/1361-6560/ade5e6","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Safe deployment of auto-contouring models requires the inclusion of automated QA. One such approach is to use two independent auto-contouring models and compare them geometrically for acceptability. This is not effective because geometric differences may not correlate with clinically significant errors. Herein, we investigated whether a two-contour QA system is improved by including dose in this comparison.
Approach. VMAT plans were generated for 86 head and neck (H&N) and 50 cervical (GYN) cancer patients, using clinically-approved PTVs and auto-contour OARs from a primary auto-contouring model. Doses to the primary OARs were compared with doses to manually drawn and approved OARs ("the truth"). A difference in Dmean or Dmax ≥ 2 Gy was identified as a reporting error (Derror). A second, independent auto-contouring model was then used to contour the OARs (verification). The primary and verification auto-contouring models were compared geometrically (DSC, sDSC, HD95, MSD) and dosimetrically (Dmean, Dmax). The ability of comparison metrics between the two auto-contouring models to flag actual dosimetric errors (i.e. primary model compared with the truth) was investigated. A logistic regression model was used to predict Derror. The data was divided by disease site and into 50/50 stratified training and testing sets; k-fold cross validation was employed during training to avoid overfitting. H&N structures were further divided into size-specific groups to improve model performance and generalizability.
Main Results. Including dose metrics in the logistic regression model to predict Derror, mean increased the performance in terms of ROC-AUC and AU-PRC in the test set for H&N small structures. For Derror, max, including dose metrics increased performance for H&N small structures, H&N medium structures, and GYN structures.
Significance. In many instances, utilizing dose with geometric comparisons can improve the ability of a verification model to flag potential errors from a primary auto-contouring model.
.
期刊介绍:
The development and application of theoretical, computational and experimental physics to medicine, physiology and biology. Topics covered are: therapy physics (including ionizing and non-ionizing radiation); biomedical imaging (e.g. x-ray, magnetic resonance, ultrasound, optical and nuclear imaging); image-guided interventions; image reconstruction and analysis (including kinetic modelling); artificial intelligence in biomedical physics and analysis; nanoparticles in imaging and therapy; radiobiology; radiation protection and patient dose monitoring; radiation dosimetry