在单数据集上训练的多厂商胸腔危险器官自动分割模型的性能

IF 2.7 3区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Sevgi Emin , Elia Rossi , Mattias Hedman , Marcela Giovenco , Fernanda Villegas , Eva Onjukka
{"title":"在单数据集上训练的多厂商胸腔危险器官自动分割模型的性能","authors":"Sevgi Emin ,&nbsp;Elia Rossi ,&nbsp;Mattias Hedman ,&nbsp;Marcela Giovenco ,&nbsp;Fernanda Villegas ,&nbsp;Eva Onjukka","doi":"10.1016/j.ejmp.2025.105089","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>This study evaluates the delineation quality of artificial intelligence (AI)-based models for auto-segmentation trained on the same dataset, as the intrinsic performance cannot be evaluated for commercial solutions due to differences in training datasets. A diverse set of challenging thoracic organs-at-risk (OAR) were chosen, to reveal potential limitations of AI-based tools which are relevant for their clinical adoption.</div></div><div><h3>Materials &amp; Methods</h3><div>A structure set with 16 OAR was delineated and reviewed by radiation oncology experts for 250 patients with lung tumours (200/50 for training/testing). Three participating vendors had access to the training dataset for a limited time to develop a model mimicking their commercial model development strategies.</div><div>The models were tested on the blind test dataset by the authors. A quantitative analysis was performed employing Dice Similarity Coefficient (DSC), surface DSC (sDSC), the 95-th percentile of the Hausdorff Distance (HD95) and average symmetric surface distance (ASSD). Inter-observer variability in manual segmentation was estimated by three independent expert delineations for a subset of five test patients.</div></div><div><h3>Results</h3><div>13 OAR had DSC &gt; 0.8, 9 had sDSC &gt; 0.8, 10 had ASSD &lt; 0.5 mm and 5 had HD95 &lt; 1 mm. The most challenging structures to auto-segment were the brachial plexus, pulmonary vein, and vena cava inferior. The overall results for all models were exceeding the inter-observer variability for all metrics.</div></div><div><h3>Conclusion</h3><div>While the evaluated AI-models perform very well for some OAR, they appear less successful at modelling organs with branching structures and poor image contrast, even when trained on a large homogeneous dataset.</div></div>","PeriodicalId":56092,"journal":{"name":"Physica Medica-European Journal of Medical Physics","volume":"137 ","pages":"Article 105089"},"PeriodicalIF":2.7000,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of multi-vendor auto-segmentation models for thoracic organs at risk trained on a single dataset\",\"authors\":\"Sevgi Emin ,&nbsp;Elia Rossi ,&nbsp;Mattias Hedman ,&nbsp;Marcela Giovenco ,&nbsp;Fernanda Villegas ,&nbsp;Eva Onjukka\",\"doi\":\"10.1016/j.ejmp.2025.105089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>This study evaluates the delineation quality of artificial intelligence (AI)-based models for auto-segmentation trained on the same dataset, as the intrinsic performance cannot be evaluated for commercial solutions due to differences in training datasets. A diverse set of challenging thoracic organs-at-risk (OAR) were chosen, to reveal potential limitations of AI-based tools which are relevant for their clinical adoption.</div></div><div><h3>Materials &amp; Methods</h3><div>A structure set with 16 OAR was delineated and reviewed by radiation oncology experts for 250 patients with lung tumours (200/50 for training/testing). Three participating vendors had access to the training dataset for a limited time to develop a model mimicking their commercial model development strategies.</div><div>The models were tested on the blind test dataset by the authors. A quantitative analysis was performed employing Dice Similarity Coefficient (DSC), surface DSC (sDSC), the 95-th percentile of the Hausdorff Distance (HD95) and average symmetric surface distance (ASSD). Inter-observer variability in manual segmentation was estimated by three independent expert delineations for a subset of five test patients.</div></div><div><h3>Results</h3><div>13 OAR had DSC &gt; 0.8, 9 had sDSC &gt; 0.8, 10 had ASSD &lt; 0.5 mm and 5 had HD95 &lt; 1 mm. The most challenging structures to auto-segment were the brachial plexus, pulmonary vein, and vena cava inferior. The overall results for all models were exceeding the inter-observer variability for all metrics.</div></div><div><h3>Conclusion</h3><div>While the evaluated AI-models perform very well for some OAR, they appear less successful at modelling organs with branching structures and poor image contrast, even when trained on a large homogeneous dataset.</div></div>\",\"PeriodicalId\":56092,\"journal\":{\"name\":\"Physica Medica-European Journal of Medical Physics\",\"volume\":\"137 \",\"pages\":\"Article 105089\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physica Medica-European Journal of Medical Physics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1120179725001991\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physica Medica-European Journal of Medical Physics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1120179725001991","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

本研究评估了在同一数据集上训练的基于人工智能(AI)的自动分割模型的描绘质量,因为由于训练数据集的差异,无法评估商业解决方案的内在性能。我们选择了一组不同的具有挑战性的胸部高危器官(OAR),以揭示基于人工智能的工具的潜在局限性,这些局限性与临床应用相关。方法由放射肿瘤学专家对250例肺癌患者(200/50为培训/测试)的16个OAR结构集进行描述和评价。三个参与的供应商可以在有限的时间内访问训练数据集,以开发模仿其商业模型开发策略的模型。作者在盲测数据集上对模型进行了测试。采用Dice Similarity Coefficient (DSC)、surface DSC (sDSC)、Hausdorff Distance (HD95)第95百分位和平均对称表面距离(ASSD)进行定量分析。人工分割的观察者间可变性由三个独立的专家对五个测试患者的子集进行了估计。结果13例OAR DSC >; 0.8, 9例sDSC >; 0.8, 10例asd <; 0.5 mm, 5例HD95 <; 1 mm。最具挑战性的结构是臂丛、肺静脉和下腔静脉。所有模型的总体结果都超过了所有指标的观察者间可变性。虽然评估的人工智能模型在某些桨叶上表现非常好,但它们在具有分支结构的器官建模和较差的图像对比度方面似乎不太成功,即使在大型同构数据集上训练也是如此。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance of multi-vendor auto-segmentation models for thoracic organs at risk trained on a single dataset

Introduction

This study evaluates the delineation quality of artificial intelligence (AI)-based models for auto-segmentation trained on the same dataset, as the intrinsic performance cannot be evaluated for commercial solutions due to differences in training datasets. A diverse set of challenging thoracic organs-at-risk (OAR) were chosen, to reveal potential limitations of AI-based tools which are relevant for their clinical adoption.

Materials & Methods

A structure set with 16 OAR was delineated and reviewed by radiation oncology experts for 250 patients with lung tumours (200/50 for training/testing). Three participating vendors had access to the training dataset for a limited time to develop a model mimicking their commercial model development strategies.
The models were tested on the blind test dataset by the authors. A quantitative analysis was performed employing Dice Similarity Coefficient (DSC), surface DSC (sDSC), the 95-th percentile of the Hausdorff Distance (HD95) and average symmetric surface distance (ASSD). Inter-observer variability in manual segmentation was estimated by three independent expert delineations for a subset of five test patients.

Results

13 OAR had DSC > 0.8, 9 had sDSC > 0.8, 10 had ASSD < 0.5 mm and 5 had HD95 < 1 mm. The most challenging structures to auto-segment were the brachial plexus, pulmonary vein, and vena cava inferior. The overall results for all models were exceeding the inter-observer variability for all metrics.

Conclusion

While the evaluated AI-models perform very well for some OAR, they appear less successful at modelling organs with branching structures and poor image contrast, even when trained on a large homogeneous dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.80
自引率
14.70%
发文量
493
审稿时长
78 days
期刊介绍: Physica Medica, European Journal of Medical Physics, publishing with Elsevier from 2007, provides an international forum for research and reviews on the following main topics: Medical Imaging Radiation Therapy Radiation Protection Measuring Systems and Signal Processing Education and training in Medical Physics Professional issues in Medical Physics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信