Comparison of the output of a deep learning segmentation model for locoregional breast cancer radiotherapy trained on 2 different datasets

Pub Date : 2023-06-01 DOI:10.1016/j.tipsro.2023.100209

Nienke Bakx , Maurice van der Sangen , Jacqueline Theuws , Hanneke Bluemink , Coen Hurkmans

{"title":"Comparison of the output of a deep learning segmentation model for locoregional breast cancer radiotherapy trained on 2 different datasets","authors":"Nienke Bakx , Maurice van der Sangen , Jacqueline Theuws , Hanneke Bluemink , Coen Hurkmans","doi":"10.1016/j.tipsro.2023.100209","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><p>The development of deep learning (DL) models for auto-segmentation is increasing and more models become commercially available. Mostly, commercial models are trained on external data. To study the effect of using a model trained on external data, compared to the same model trained on in-house collected data, the performance of these two DL models was evaluated.</p></div><div><h3>Methods</h3><p>The evaluation was performed using in-house collected data of 30 breast cancer patients. Quantitative analysis was performed using Dice similarity coefficient (DSC), surface DSC (sDSC) and 95th percentile of Hausdorff Distance (95% HD). These values were compared with previously reported inter-observer variations (IOV).</p></div><div><h3>Results</h3><p>For a number of structures, statistically significant differences were found between the two models. For organs at risk, mean values for DSC ranged from 0.63 to 0.98 and 0.71 to 0.96 for the in-house and external model, respectively. For target volumes, mean DSC values of 0.57 to 0.94 and 0.33 to 0.92 were found. The difference of 95% HD values ranged 0.08 to 3.23 mm between the two models, except for CTVn4 with 9.95 mm. For the external model, both DSC and 95% HD are outside the range of IOV for CTVn4, whereas this is the case for the DSC found for the thyroid of the in-house model.</p></div><div><h3>Conclusions</h3><p>Statistically significant differences were found between both models, which were mostly within published inter-observer variations, showing clinical usefulness of both models. Our findings could encourage discussion and revision of existing guidelines, to further decrease inter-observer, but also inter-institute variability.</p></div>","PeriodicalId":74910,"journal":{"name":"","volume":"26 ","pages":"Article 100209"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/cd/11/main.PMC10199413.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2405632423000094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Introduction

The development of deep learning (DL) models for auto-segmentation is increasing and more models become commercially available. Mostly, commercial models are trained on external data. To study the effect of using a model trained on external data, compared to the same model trained on in-house collected data, the performance of these two DL models was evaluated.

Methods

The evaluation was performed using in-house collected data of 30 breast cancer patients. Quantitative analysis was performed using Dice similarity coefficient (DSC), surface DSC (sDSC) and 95th percentile of Hausdorff Distance (95% HD). These values were compared with previously reported inter-observer variations (IOV).

Results

For a number of structures, statistically significant differences were found between the two models. For organs at risk, mean values for DSC ranged from 0.63 to 0.98 and 0.71 to 0.96 for the in-house and external model, respectively. For target volumes, mean DSC values of 0.57 to 0.94 and 0.33 to 0.92 were found. The difference of 95% HD values ranged 0.08 to 3.23 mm between the two models, except for CTVn4 with 9.95 mm. For the external model, both DSC and 95% HD are outside the range of IOV for CTVn4, whereas this is the case for the DSC found for the thyroid of the in-house model.

Conclusions

Statistically significant differences were found between both models, which were mostly within published inter-observer variations, showing clinical usefulness of both models. Our findings could encourage discussion and revision of existing guidelines, to further decrease inter-observer, but also inter-institute variability.

Abstract Image

查看原文本刊更多论文

局部区域乳腺癌放疗深度学习分割模型在2个不同数据集上的输出比较

引言用于自动分割的深度学习（DL）模型的开发越来越多，越来越多的模型可以商业化。大多数情况下，商业模型都是根据外部数据进行训练的。为了研究使用根据外部数据训练的模型的效果，与根据内部收集的数据训练的同一模型相比，评估了这两个DL模型的性能。方法对30例癌症患者的临床资料进行评价。使用Dice相似系数（DSC）、表面DSC（sDSC）和Hausdorff距离的第95百分位（95%HD）进行定量分析。将这些值与先前报道的观察者间变异（IOV）进行比较。结果对于许多结构，两个模型之间存在统计学上的显著差异。对于有风险的器官，内部和外部模型的DSC平均值分别为0.63至0.98和0.71至0.96。对于目标体积，发现平均DSC值为0.57至0.94和0.33至0.92。除了9.95 mm的CTVn4外，两种型号之间95%HD值的差异范围为0.08至3.23 mm。对于外部型号，DSC和95%HD都在CTVn5的IOV范围之外，而内部型号甲状腺的DSC则是这样。结论两种模型之间存在统计学显著差异，主要在已发表的观察者间变异范围内，表明两种模型的临床实用性。我们的研究结果可以鼓励对现有指南进行讨论和修订，以进一步减少观察者之间以及研究机构之间的可变性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文