Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images.

IF 4.7 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
European Radiology Pub Date : 2025-07-01 Epub Date: 2024-12-31 DOI:10.1007/s00330-024-11321-2
Corentin Guérendel, Liliana Petrychenko, Kalina Chupetlovska, Zuhir Bodalal, Regina G H Beets-Tan, Sean Benson
{"title":"Generalizability, robustness, and correction bias of segmentations of thoracic organs at risk in CT images.","authors":"Corentin Guérendel, Liliana Petrychenko, Kalina Chupetlovska, Zuhir Bodalal, Regina G H Beets-Tan, Sean Benson","doi":"10.1007/s00330-024-11321-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning.</p><p><strong>Materials and methods: </strong>We compare a multi-organ segmentation approach and the fusion of multiple single-organ models, each dedicated to one OAR. All were trained using nnU-Net with the default parameters and the full-resolution configuration. We evaluate their robustness with adversarial perturbations, and their generalizability on external datasets, and explore potential biases introduced by expert corrections compared to fully manual delineations.</p><p><strong>Results: </strong>The two approaches show excellent performance with an average Dice score of 0.928 for the multi-class setting and 0.930 when fusing the four single-organ models. The evaluation of external datasets and common procedural adversarial noise demonstrates the good generalizability of these models. In addition, expert corrections of both models show significant bias to the original automated segmentation. The average Dice score between the two corrections is 0.93, ranging from 0.88 for the trachea to 0.98 for the heart.</p><p><strong>Conclusion: </strong>Both approaches demonstrate excellent performance and generalizability in segmenting four thoracic OARs, potentially improving efficiency in radiotherapy planning. However, the multi-organ setting proves advantageous for its efficiency, requiring less training time and fewer resources, making it a preferable choice for this task. Moreover, corrections of AI segmentation by clinicians may lead to biases in the results of AI approaches. A test set, manually annotated, should be used to assess the performance of such methods.</p><p><strong>Key points: </strong>Question While manual delineation of thoracic organs at risk is labor-intensive, prone to errors, and time-consuming, evaluation of AI models performing this task lacks robustness. Findings The deep-learning model using the nnU-Net framework showed excellent performance, generalizability, and robustness in segmenting thoracic organs in CT, enhancing radiotherapy planning efficiency. Clinical relevance Automatic segmentation of thoracic organs at risk can save clinicians time without compromising the quality of the delineations, and extensive evaluation across diverse settings demonstrates the potential of integrating such models into clinical practice.</p>","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":"4335-4346"},"PeriodicalIF":4.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11321-2","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: This study aims to assess and compare two state-of-the-art deep learning approaches for segmenting four thoracic organs at risk (OAR)-the esophagus, trachea, heart, and aorta-in CT images in the context of radiotherapy planning.

Materials and methods: We compare a multi-organ segmentation approach and the fusion of multiple single-organ models, each dedicated to one OAR. All were trained using nnU-Net with the default parameters and the full-resolution configuration. We evaluate their robustness with adversarial perturbations, and their generalizability on external datasets, and explore potential biases introduced by expert corrections compared to fully manual delineations.

Results: The two approaches show excellent performance with an average Dice score of 0.928 for the multi-class setting and 0.930 when fusing the four single-organ models. The evaluation of external datasets and common procedural adversarial noise demonstrates the good generalizability of these models. In addition, expert corrections of both models show significant bias to the original automated segmentation. The average Dice score between the two corrections is 0.93, ranging from 0.88 for the trachea to 0.98 for the heart.

Conclusion: Both approaches demonstrate excellent performance and generalizability in segmenting four thoracic OARs, potentially improving efficiency in radiotherapy planning. However, the multi-organ setting proves advantageous for its efficiency, requiring less training time and fewer resources, making it a preferable choice for this task. Moreover, corrections of AI segmentation by clinicians may lead to biases in the results of AI approaches. A test set, manually annotated, should be used to assess the performance of such methods.

Key points: Question While manual delineation of thoracic organs at risk is labor-intensive, prone to errors, and time-consuming, evaluation of AI models performing this task lacks robustness. Findings The deep-learning model using the nnU-Net framework showed excellent performance, generalizability, and robustness in segmenting thoracic organs in CT, enhancing radiotherapy planning efficiency. Clinical relevance Automatic segmentation of thoracic organs at risk can save clinicians time without compromising the quality of the delineations, and extensive evaluation across diverse settings demonstrates the potential of integrating such models into clinical practice.

CT图像中胸部器官危险分割的通用性、稳健性和校正偏差。
目的:本研究旨在评估和比较两种最先进的深度学习方法,用于在放疗计划背景下分割四种胸部危险器官(OAR)——食道、气管、心脏和主动脉。材料和方法:我们比较了多器官分割方法和多个单器官模型的融合,每个模型都专用于一个桨。所有人都使用默认参数和全分辨率配置的nnU-Net进行训练。我们评估了它们在对抗性扰动下的鲁棒性,以及它们在外部数据集上的泛化性,并探索了与完全手动描绘相比,专家修正引入的潜在偏差。结果:两种方法在多类模型下的平均Dice分数为0.928,在融合4种单器官模型时的平均Dice分数为0.930。外部数据集和常见的程序对抗噪声的评估证明了这些模型的良好泛化性。此外,两种模型的专家修正都显示出对原始自动分割的显著偏差。两次修正之间的平均Dice得分为0.93,范围从气管的0.88到心脏的0.98。结论:两种入路在四段胸椎桨的分割中均表现出优异的性能和通用性,有可能提高放疗计划的效率。然而,多器官设置被证明具有效率优势,需要较少的训练时间和资源,使其成为该任务的首选。此外,临床医生对人工智能分割的修正可能会导致人工智能方法结果的偏差。应该使用手动注释的测试集来评估这些方法的性能。虽然人工描绘有风险的胸部器官是劳动密集型的,容易出错,而且耗时,对执行这项任务的人工智能模型的评估缺乏鲁棒性。结果基于nnU-Net框架的深度学习模型在胸部器官CT分割中表现出优异的性能、通用性和鲁棒性,提高了放疗规划效率。有危险的胸部器官的自动分割可以节省临床医生的时间而不影响描绘的质量,并且在不同环境下的广泛评估表明了将这种模型整合到临床实践中的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
European Radiology
European Radiology 医学-核医学
CiteScore
11.60
自引率
8.50%
发文量
874
审稿时长
2-4 weeks
期刊介绍: European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field. This is the Journal of the European Society of Radiology, and the official journal of a number of societies. From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信