Nouf M. Alzahrani, Ann M. Henry, Bashar M. Al-Qaisieh, Louise J. Murray, Michael G. Nix
{"title":"Automated confidence estimation in deep learning auto-segmentation for brain organs at risk on MRI for radiotherapy","authors":"Nouf M. Alzahrani, Ann M. Henry, Bashar M. Al-Qaisieh, Louise J. Murray, Michael G. Nix","doi":"10.1002/acm2.14513","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>We have built a novel AI-driven QA method called AutoConfidence (ACo), to estimate segmentation confidence on a per-voxel basis without gold standard segmentations, enabling robust, efficient review of automated segmentation (AS). We have demonstrated this method in brain OAR AS on MRI, using internal and external (third-party) AS models.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Thirty-two retrospectives, MRI planned, glioma cases were randomly selected from a local clinical cohort for ACo training. A generator was trained adversarialy to produce internal autosegmentations (IAS) with a discriminator to estimate voxel-wise IAS uncertainty, given the input MRI. Confidence maps for each proposed segmentation were produced for operator use in AS editing and were compared with “difference to gold-standard” error maps. Nine cases were used for testing ACo performance on IAS and validation with two external deep learning segmentation model predictions [external model with low-quality AS (EM-LQ) and external model with high-quality AS (EM-HQ)]. Matthew's correlation coefficient (MCC), false-positive rate (FPR), false-negative rate (FNR), and visual assessment were used for evaluation. Edge removal and geometric distance corrections were applied to achieve more useful and clinically relevant confidence maps and performance metrics.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>ACo showed generally excellent performance on both internal and external segmentations, across all OARs (except lenses). MCC was higher on IAS and low-quality external segmentations (EM-LQ) than high-quality ones (EM-HQ). On IAS and EM-LQ, average MCC (excluding lenses) varied from 0.6 to 0.9, while average FPR and FNR were ≤0.13 and ≤0.21, respectively. For EM-HQ, average MCC varied from 0.4 to 0.8, while average FPR and FNR were ≤0.37 and ≤0.22, respectively.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>ACo was a reliable predictor of uncertainty and errors on AS generated both internally and externally, demonstrating its potential as an independent, reference-free QA tool, which could help operators deliver robust, efficient autosegmentation in the radiotherapy clinic.</p>\n </section>\n </div>","PeriodicalId":14989,"journal":{"name":"Journal of Applied Clinical Medical Physics","volume":"25 12","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/acm2.14513","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Clinical Medical Physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/acm2.14513","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
We have built a novel AI-driven QA method called AutoConfidence (ACo), to estimate segmentation confidence on a per-voxel basis without gold standard segmentations, enabling robust, efficient review of automated segmentation (AS). We have demonstrated this method in brain OAR AS on MRI, using internal and external (third-party) AS models.
Methods
Thirty-two retrospectives, MRI planned, glioma cases were randomly selected from a local clinical cohort for ACo training. A generator was trained adversarialy to produce internal autosegmentations (IAS) with a discriminator to estimate voxel-wise IAS uncertainty, given the input MRI. Confidence maps for each proposed segmentation were produced for operator use in AS editing and were compared with “difference to gold-standard” error maps. Nine cases were used for testing ACo performance on IAS and validation with two external deep learning segmentation model predictions [external model with low-quality AS (EM-LQ) and external model with high-quality AS (EM-HQ)]. Matthew's correlation coefficient (MCC), false-positive rate (FPR), false-negative rate (FNR), and visual assessment were used for evaluation. Edge removal and geometric distance corrections were applied to achieve more useful and clinically relevant confidence maps and performance metrics.
Results
ACo showed generally excellent performance on both internal and external segmentations, across all OARs (except lenses). MCC was higher on IAS and low-quality external segmentations (EM-LQ) than high-quality ones (EM-HQ). On IAS and EM-LQ, average MCC (excluding lenses) varied from 0.6 to 0.9, while average FPR and FNR were ≤0.13 and ≤0.21, respectively. For EM-HQ, average MCC varied from 0.4 to 0.8, while average FPR and FNR were ≤0.37 and ≤0.22, respectively.
Conclusion
ACo was a reliable predictor of uncertainty and errors on AS generated both internally and externally, demonstrating its potential as an independent, reference-free QA tool, which could help operators deliver robust, efficient autosegmentation in the radiotherapy clinic.
期刊介绍:
Journal of Applied Clinical Medical Physics is an international Open Access publication dedicated to clinical medical physics. JACMP welcomes original contributions dealing with all aspects of medical physics from scientists working in the clinical medical physics around the world. JACMP accepts only online submission.
JACMP will publish:
-Original Contributions: Peer-reviewed, investigations that represent new and significant contributions to the field. Recommended word count: up to 7500.
-Review Articles: Reviews of major areas or sub-areas in the field of clinical medical physics. These articles may be of any length and are peer reviewed.
-Technical Notes: These should be no longer than 3000 words, including key references.
-Letters to the Editor: Comments on papers published in JACMP or on any other matters of interest to clinical medical physics. These should not be more than 1250 (including the literature) and their publication is only based on the decision of the editor, who occasionally asks experts on the merit of the contents.
-Book Reviews: The editorial office solicits Book Reviews.
-Announcements of Forthcoming Meetings: The Editor may provide notice of forthcoming meetings, course offerings, and other events relevant to clinical medical physics.
-Parallel Opposed Editorial: We welcome topics relevant to clinical practice and medical physics profession. The contents can be controversial debate or opposed aspects of an issue. One author argues for the position and the other against. Each side of the debate contains an opening statement up to 800 words, followed by a rebuttal up to 500 words. Readers interested in participating in this series should contact the moderator with a proposed title and a short description of the topic