基于人工智能的自动轮廓软件在日常实践中的鲁棒性评价。

IF 2.7 3区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Physica Medica-European Journal of Medical Physics Pub Date : 2025-09-01 Epub Date: 2025-08-07 DOI:10.1016/j.ejmp.2025.105065

Jimmy Fontaine, Maud Suszko, Francesca di Franco, Agathe Leroux, Emilie Bonnet, Mathieu Bosset, Julien Langrand-Escure, Sébastien Clippe, Bertrand Fleury, Jean-Baptiste Guy

{"title":"基于人工智能的自动轮廓软件在日常实践中的鲁棒性评价。","authors":"Jimmy Fontaine, Maud Suszko, Francesca di Franco, Agathe Leroux, Emilie Bonnet, Mathieu Bosset, Julien Langrand-Escure, Sébastien Clippe, Bertrand Fleury, Jean-Baptiste Guy","doi":"10.1016/j.ejmp.2025.105065","DOIUrl":null,"url":null,"abstract":"Purpose: AI-based automatic contouring streamlines radiotherapy by reducing contouring time but requires rigorous validation and ongoing daily monitoring. This study assessed how software updates affect contouring accuracy and examined how image quality variations influence AI performance.Methods: Two patient cohorts were analyzed. The software updates cohort (40 CT scans: 20 thorax, 10 pelvis, 10 H&N) compared six versions of Limbus AI contouring software. The image quality cohort (20 patients: H&N, pelvis, brain, thorax) analyzed 12 reconstructions per patient using Standard, iDose, and IMR algorithms, with simulated noise and spatial resolution (SR) degradations. AI performance was assessed using Volumetric Dice Similarity Coefficient (vDSC) and 95 % Hausdorff Distance (HD95%) with Wilcoxon tests for significance.Results: In the software updates cohort, vDSC improved for re-trained structures across versions (mean DSC ≥ 0.75), with breast contour vDSC decreasing by 1 % between v1.5 and v1.8B3 (p > 0.05). Median HD95% values were consistently <4 mm, <5 mm, and <12 mm for H&N, pelvis, and thorax contours, respectively (p > 0.05). In the image quality cohort, no significant differences were observed between Standard, iDose, and IMR algorithms. However, noise and SR degradation significantly reduced performance: vDSC ≥ 0.9 dropped from 89 % at 2 % noise to 30 % at 20 %, and from 87 % to 70 % as SR degradation increased (p < 0.001).Conclusion: AI contouring accuracy improved with software updates and showed robustness to minor reconstruction variations, but it was sensitive to noise and SR degradation. Continuous validation and quality control of AI-generated contours are essential. Future studies should include a broader range of anatomical regions and larger cohorts.","PeriodicalId":56092,"journal":{"name":"Physica Medica-European Journal of Medical Physics","volume":"137 ","pages":"105065"},"PeriodicalIF":2.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robustness evaluation of an artificial intelligence-based automatic contouring software in daily routine practice.\",\"authors\":\"Jimmy Fontaine, Maud Suszko, Francesca di Franco, Agathe Leroux, Emilie Bonnet, Mathieu Bosset, Julien Langrand-Escure, Sébastien Clippe, Bertrand Fleury, Jean-Baptiste Guy\",\"doi\":\"10.1016/j.ejmp.2025.105065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: AI-based automatic contouring streamlines radiotherapy by reducing contouring time but requires rigorous validation and ongoing daily monitoring. This study assessed how software updates affect contouring accuracy and examined how image quality variations influence AI performance.Methods: Two patient cohorts were analyzed. The software updates cohort (40 CT scans: 20 thorax, 10 pelvis, 10 H&N) compared six versions of Limbus AI contouring software. The image quality cohort (20 patients: H&N, pelvis, brain, thorax) analyzed 12 reconstructions per patient using Standard, iDose, and IMR algorithms, with simulated noise and spatial resolution (SR) degradations. AI performance was assessed using Volumetric Dice Similarity Coefficient (vDSC) and 95 % Hausdorff Distance (HD95%) with Wilcoxon tests for significance.Results: In the software updates cohort, vDSC improved for re-trained structures across versions (mean DSC ≥ 0.75), with breast contour vDSC decreasing by 1 % between v1.5 and v1.8B3 (p > 0.05). Median HD95% values were consistently <4 mm, <5 mm, and <12 mm for H&N, pelvis, and thorax contours, respectively (p > 0.05). In the image quality cohort, no significant differences were observed between Standard, iDose, and IMR algorithms. However, noise and SR degradation significantly reduced performance: vDSC ≥ 0.9 dropped from 89 % at 2 % noise to 30 % at 20 %, and from 87 % to 70 % as SR degradation increased (p < 0.001).Conclusion: AI contouring accuracy improved with software updates and showed robustness to minor reconstruction variations, but it was sensitive to noise and SR degradation. Continuous validation and quality control of AI-generated contours are essential. Future studies should include a broader range of anatomical regions and larger cohorts.\",\"PeriodicalId\":56092,\"journal\":{\"name\":\"Physica Medica-European Journal of Medical Physics\",\"volume\":\"137 \",\"pages\":\"105065\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physica Medica-European Journal of Medical Physics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ejmp.2025.105065\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/7 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physica Medica-European Journal of Medical Physics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ejmp.2025.105065","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的：基于人工智能的自动轮廓通过减少轮廓时间来简化放疗，但需要严格的验证和持续的日常监测。本研究评估了软件更新如何影响轮廓精度，并检查了图像质量变化如何影响人工智能性能。方法：对两组患者进行分析。软件更新队列（40个CT扫描：20个胸腔，10个骨盆，10个H&N）比较了六个版本的Limbus AI轮廓软件。图像质量队列（20例患者：H&N，骨盆，脑，胸腔）使用Standard， iDose和IMR算法分析每位患者12次重建，模拟噪声和空间分辨率（SR）降低。采用体积骰子相似系数（vDSC）和95%豪斯多夫距离（HD95%）评估人工智能性能，并采用Wilcoxon检验进行显著性检验。结果：在软件更新队列中，重新训练结构的vDSC在不同版本中有所改善（平均DSC≥0.75），乳房轮廓vDSC在v1.5和v1.8B3之间下降了1% （p > 0.05）。中位HD95%值一致为0.05)。在图像质量队列中，在Standard、iDose和IMR算法之间没有观察到显著差异。然而，噪声和SR退化显著降低了性能：vDSC≥0.9从2%噪声时的89%下降到20%噪声时的30%，随着SR退化的增加，vDSC从87%下降到70% (p)结论：人工智能轮廓精度随着软件更新而提高，对较小的重建变化表现出鲁棒性，但对噪声和SR退化敏感。人工智能生成轮廓的持续验证和质量控制是必不可少的。未来的研究应包括更广泛的解剖区域和更大的队列。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Robustness evaluation of an artificial intelligence-based automatic contouring software in daily routine practice.

Purpose: AI-based automatic contouring streamlines radiotherapy by reducing contouring time but requires rigorous validation and ongoing daily monitoring. This study assessed how software updates affect contouring accuracy and examined how image quality variations influence AI performance.

Methods: Two patient cohorts were analyzed. The software updates cohort (40 CT scans: 20 thorax, 10 pelvis, 10 H&N) compared six versions of Limbus AI contouring software. The image quality cohort (20 patients: H&N, pelvis, brain, thorax) analyzed 12 reconstructions per patient using Standard, iDose, and IMR algorithms, with simulated noise and spatial resolution (SR) degradations. AI performance was assessed using Volumetric Dice Similarity Coefficient (vDSC) and 95 % Hausdorff Distance (HD95%) with Wilcoxon tests for significance.

Results: In the software updates cohort, vDSC improved for re-trained structures across versions (mean DSC ≥ 0.75), with breast contour vDSC decreasing by 1 % between v1.5 and v1.8B3 (p > 0.05). Median HD95% values were consistently <4 mm, <5 mm, and <12 mm for H&N, pelvis, and thorax contours, respectively (p > 0.05). In the image quality cohort, no significant differences were observed between Standard, iDose, and IMR algorithms. However, noise and SR degradation significantly reduced performance: vDSC ≥ 0.9 dropped from 89 % at 2 % noise to 30 % at 20 %, and from 87 % to 70 % as SR degradation increased (p < 0.001).

Conclusion: AI contouring accuracy improved with software updates and showed robustness to minor reconstruction variations, but it was sensitive to noise and SR degradation. Continuous validation and quality control of AI-generated contours are essential. Future studies should include a broader range of anatomical regions and larger cohorts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Physica Medica-European Journal of Medical Physics 生物-生物物理

CiteScore

6.80

自引率

14.70%

发文量

493

审稿时长

78 days

期刊介绍： Physica Medica, European Journal of Medical Physics, publishing with Elsevier from 2007, provides an international forum for research and reviews on the following main topics: Medical Imaging Radiation Therapy Radiation Protection Measuring Systems and Signal Processing Education and training in Medical Physics Professional issues in Medical Physics.