Tingyu Wang, James Tam, Thomas Chum, Cyril Tai, Deborah C. Marshall, Michael Buckstein, Jerry Liu, Sheryl Green, Robert D. Stewart, Tian Liu, Ming Chao
{"title":"Evaluation of AI-based auto-contouring tools in radiotherapy: A single-institution study","authors":"Tingyu Wang, James Tam, Thomas Chum, Cyril Tai, Deborah C. Marshall, Michael Buckstein, Jerry Liu, Sheryl Green, Robert D. Stewart, Tian Liu, Ming Chao","doi":"10.1002/acm2.14620","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Accurate delineation of organs at risk (OARs) is crucial yet time-consuming in the radiotherapy treatment planning workflow. Modern artificial intelligence (AI) technologies had made automation of OAR contouring feasible. This report details a single institution's experience in evaluating two commercial auto-contouring software tools and making well-informed decisions about their clinical adoption.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>A cohort of 36 patients previously treated at our institution were selected for the software performance assessment. Fifty-eight OAR structures from seven disease sites were automatically segmented with each tool. Five radiation oncologists with different specialties qualitatively scored the automatic OAR contours’ clinical usability by a 4-level scale (0–3), termed as quality score (QS), representing from “0: not usable” to “3: directly usable for a clinic.” Additionally, quantitative comparison with clinically approved contours using Dice similarity coefficient (DSC) and the 95% Hausdorff distance (HD95) was performed in complement to QS from physicians.</p>\n </section>\n \n <section>\n \n <h3> Result</h3>\n \n <p>Software A achieved an average QS of 2.17 ± 0.69, comparable to Software B's average QS of 2.17 ± 0.72. Software B performed better with more OARs (42 vs. 37) that required minor or no modification than Software A. Major modifications were needed for 13 out of 58 automated contours from both tools. Both DSC and HD95 scores for the two tools were comparable to each other, with DSC: 0.67 ± 0.23 versus 0.66 ± 0.21 and HD95: 13.07 ± 15.84 versus 15.55 ± 18.45 for Software A and Software B, respectively. Correlation coefficients between the physician score and the quantitative metrics suggested that the contouring results from Software A aligned more closely with the physician's evaluations.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Based on our study, either software tool could produce clinically acceptable contours for about 65% of the OAR structures. However, further refinement is necessary for several challenging OARs to improve model performance.</p>\n </section>\n </div>","PeriodicalId":14989,"journal":{"name":"Journal of Applied Clinical Medical Physics","volume":"26 4","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/acm2.14620","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Clinical Medical Physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/acm2.14620","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Accurate delineation of organs at risk (OARs) is crucial yet time-consuming in the radiotherapy treatment planning workflow. Modern artificial intelligence (AI) technologies had made automation of OAR contouring feasible. This report details a single institution's experience in evaluating two commercial auto-contouring software tools and making well-informed decisions about their clinical adoption.
Methods
A cohort of 36 patients previously treated at our institution were selected for the software performance assessment. Fifty-eight OAR structures from seven disease sites were automatically segmented with each tool. Five radiation oncologists with different specialties qualitatively scored the automatic OAR contours’ clinical usability by a 4-level scale (0–3), termed as quality score (QS), representing from “0: not usable” to “3: directly usable for a clinic.” Additionally, quantitative comparison with clinically approved contours using Dice similarity coefficient (DSC) and the 95% Hausdorff distance (HD95) was performed in complement to QS from physicians.
Result
Software A achieved an average QS of 2.17 ± 0.69, comparable to Software B's average QS of 2.17 ± 0.72. Software B performed better with more OARs (42 vs. 37) that required minor or no modification than Software A. Major modifications were needed for 13 out of 58 automated contours from both tools. Both DSC and HD95 scores for the two tools were comparable to each other, with DSC: 0.67 ± 0.23 versus 0.66 ± 0.21 and HD95: 13.07 ± 15.84 versus 15.55 ± 18.45 for Software A and Software B, respectively. Correlation coefficients between the physician score and the quantitative metrics suggested that the contouring results from Software A aligned more closely with the physician's evaluations.
Conclusion
Based on our study, either software tool could produce clinically acceptable contours for about 65% of the OAR structures. However, further refinement is necessary for several challenging OARs to improve model performance.
期刊介绍:
Journal of Applied Clinical Medical Physics is an international Open Access publication dedicated to clinical medical physics. JACMP welcomes original contributions dealing with all aspects of medical physics from scientists working in the clinical medical physics around the world. JACMP accepts only online submission.
JACMP will publish:
-Original Contributions: Peer-reviewed, investigations that represent new and significant contributions to the field. Recommended word count: up to 7500.
-Review Articles: Reviews of major areas or sub-areas in the field of clinical medical physics. These articles may be of any length and are peer reviewed.
-Technical Notes: These should be no longer than 3000 words, including key references.
-Letters to the Editor: Comments on papers published in JACMP or on any other matters of interest to clinical medical physics. These should not be more than 1250 (including the literature) and their publication is only based on the decision of the editor, who occasionally asks experts on the merit of the contents.
-Book Reviews: The editorial office solicits Book Reviews.
-Announcements of Forthcoming Meetings: The Editor may provide notice of forthcoming meetings, course offerings, and other events relevant to clinical medical physics.
-Parallel Opposed Editorial: We welcome topics relevant to clinical practice and medical physics profession. The contents can be controversial debate or opposed aspects of an issue. One author argues for the position and the other against. Each side of the debate contains an opening statement up to 800 words, followed by a rebuttal up to 500 words. Readers interested in participating in this series should contact the moderator with a proposed title and a short description of the topic