Jimmy Fontaine, Maud Suszko, Francesca di Franco, Agathe Leroux, Emilie Bonnet, Mathieu Bosset, Julien Langrand-Escure, Sébastien Clippe, Bertrand Fleury, Jean-Baptiste Guy
{"title":"基于人工智能的自动轮廓软件在日常实践中的鲁棒性评价。","authors":"Jimmy Fontaine, Maud Suszko, Francesca di Franco, Agathe Leroux, Emilie Bonnet, Mathieu Bosset, Julien Langrand-Escure, Sébastien Clippe, Bertrand Fleury, Jean-Baptiste Guy","doi":"10.1016/j.ejmp.2025.105065","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>AI-based automatic contouring streamlines radiotherapy by reducing contouring time but requires rigorous validation and ongoing daily monitoring. This study assessed how software updates affect contouring accuracy and examined how image quality variations influence AI performance.</p><p><strong>Methods: </strong>Two patient cohorts were analyzed. The software updates cohort (40 CT scans: 20 thorax, 10 pelvis, 10 H&N) compared six versions of Limbus AI contouring software. The image quality cohort (20 patients: H&N, pelvis, brain, thorax) analyzed 12 reconstructions per patient using Standard, iDose, and IMR algorithms, with simulated noise and spatial resolution (SR) degradations. AI performance was assessed using Volumetric Dice Similarity Coefficient (vDSC) and 95 % Hausdorff Distance (HD95%) with Wilcoxon tests for significance.</p><p><strong>Results: </strong>In the software updates cohort, vDSC improved for re-trained structures across versions (mean DSC ≥ 0.75), with breast contour vDSC decreasing by 1 % between v1.5 and v1.8B3 (p > 0.05). Median HD95% values were consistently <4 mm, <5 mm, and <12 mm for H&N, pelvis, and thorax contours, respectively (p > 0.05). In the image quality cohort, no significant differences were observed between Standard, iDose, and IMR algorithms. However, noise and SR degradation significantly reduced performance: vDSC ≥ 0.9 dropped from 89 % at 2 % noise to 30 % at 20 %, and from 87 % to 70 % as SR degradation increased (p < 0.001).</p><p><strong>Conclusion: </strong>AI contouring accuracy improved with software updates and showed robustness to minor reconstruction variations, but it was sensitive to noise and SR degradation. Continuous validation and quality control of AI-generated contours are essential. Future studies should include a broader range of anatomical regions and larger cohorts.</p>","PeriodicalId":56092,"journal":{"name":"Physica Medica-European Journal of Medical Physics","volume":"137 ","pages":"105065"},"PeriodicalIF":2.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robustness evaluation of an artificial intelligence-based automatic contouring software in daily routine practice.\",\"authors\":\"Jimmy Fontaine, Maud Suszko, Francesca di Franco, Agathe Leroux, Emilie Bonnet, Mathieu Bosset, Julien Langrand-Escure, Sébastien Clippe, Bertrand Fleury, Jean-Baptiste Guy\",\"doi\":\"10.1016/j.ejmp.2025.105065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>AI-based automatic contouring streamlines radiotherapy by reducing contouring time but requires rigorous validation and ongoing daily monitoring. This study assessed how software updates affect contouring accuracy and examined how image quality variations influence AI performance.</p><p><strong>Methods: </strong>Two patient cohorts were analyzed. The software updates cohort (40 CT scans: 20 thorax, 10 pelvis, 10 H&N) compared six versions of Limbus AI contouring software. The image quality cohort (20 patients: H&N, pelvis, brain, thorax) analyzed 12 reconstructions per patient using Standard, iDose, and IMR algorithms, with simulated noise and spatial resolution (SR) degradations. AI performance was assessed using Volumetric Dice Similarity Coefficient (vDSC) and 95 % Hausdorff Distance (HD95%) with Wilcoxon tests for significance.</p><p><strong>Results: </strong>In the software updates cohort, vDSC improved for re-trained structures across versions (mean DSC ≥ 0.75), with breast contour vDSC decreasing by 1 % between v1.5 and v1.8B3 (p > 0.05). Median HD95% values were consistently <4 mm, <5 mm, and <12 mm for H&N, pelvis, and thorax contours, respectively (p > 0.05). In the image quality cohort, no significant differences were observed between Standard, iDose, and IMR algorithms. However, noise and SR degradation significantly reduced performance: vDSC ≥ 0.9 dropped from 89 % at 2 % noise to 30 % at 20 %, and from 87 % to 70 % as SR degradation increased (p < 0.001).</p><p><strong>Conclusion: </strong>AI contouring accuracy improved with software updates and showed robustness to minor reconstruction variations, but it was sensitive to noise and SR degradation. Continuous validation and quality control of AI-generated contours are essential. Future studies should include a broader range of anatomical regions and larger cohorts.</p>\",\"PeriodicalId\":56092,\"journal\":{\"name\":\"Physica Medica-European Journal of Medical Physics\",\"volume\":\"137 \",\"pages\":\"105065\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physica Medica-European Journal of Medical Physics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ejmp.2025.105065\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/7 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physica Medica-European Journal of Medical Physics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ejmp.2025.105065","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Robustness evaluation of an artificial intelligence-based automatic contouring software in daily routine practice.
Purpose: AI-based automatic contouring streamlines radiotherapy by reducing contouring time but requires rigorous validation and ongoing daily monitoring. This study assessed how software updates affect contouring accuracy and examined how image quality variations influence AI performance.
Methods: Two patient cohorts were analyzed. The software updates cohort (40 CT scans: 20 thorax, 10 pelvis, 10 H&N) compared six versions of Limbus AI contouring software. The image quality cohort (20 patients: H&N, pelvis, brain, thorax) analyzed 12 reconstructions per patient using Standard, iDose, and IMR algorithms, with simulated noise and spatial resolution (SR) degradations. AI performance was assessed using Volumetric Dice Similarity Coefficient (vDSC) and 95 % Hausdorff Distance (HD95%) with Wilcoxon tests for significance.
Results: In the software updates cohort, vDSC improved for re-trained structures across versions (mean DSC ≥ 0.75), with breast contour vDSC decreasing by 1 % between v1.5 and v1.8B3 (p > 0.05). Median HD95% values were consistently <4 mm, <5 mm, and <12 mm for H&N, pelvis, and thorax contours, respectively (p > 0.05). In the image quality cohort, no significant differences were observed between Standard, iDose, and IMR algorithms. However, noise and SR degradation significantly reduced performance: vDSC ≥ 0.9 dropped from 89 % at 2 % noise to 30 % at 20 %, and from 87 % to 70 % as SR degradation increased (p < 0.001).
Conclusion: AI contouring accuracy improved with software updates and showed robustness to minor reconstruction variations, but it was sensitive to noise and SR degradation. Continuous validation and quality control of AI-generated contours are essential. Future studies should include a broader range of anatomical regions and larger cohorts.
期刊介绍:
Physica Medica, European Journal of Medical Physics, publishing with Elsevier from 2007, provides an international forum for research and reviews on the following main topics:
Medical Imaging
Radiation Therapy
Radiation Protection
Measuring Systems and Signal Processing
Education and training in Medical Physics
Professional issues in Medical Physics.