Uncertainty-aware deep learning for segmentation of primary tumor and pathologic lymph nodes in oropharyngeal cancer: Insights from a multi-center cohort

IF 5.4 2区医学 Q1 ENGINEERING, BIOMEDICAL

Computerized Medical Imaging and Graphics Pub Date : 2025-03-25 DOI:10.1016/j.compmedimag.2025.102535

Alessia De Biase , Nanna Maria Sijtsema , Lisanne V. van Dijk , Roel Steenbakkers , Johannes A. Langendijk , Peter van Ooijen

{"title":"Uncertainty-aware deep learning for segmentation of primary tumor and pathologic lymph nodes in oropharyngeal cancer: Insights from a multi-center cohort","authors":"Alessia De Biase , Nanna Maria Sijtsema , Lisanne V. van Dijk , Roel Steenbakkers , Johannes A. Langendijk , Peter van Ooijen","doi":"10.1016/j.compmedimag.2025.102535","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Information on deep learning (DL) tumor segmentation accuracy on a voxel and a structure level is essential for clinical introduction. In a previous study, a DL model was developed for oropharyngeal cancer (OPC) primary tumor (PT) segmentation in PET/CT images and voxel-level predicted probabilities (TPM) quantifying model certainty were introduced. This study extended the network to simultaneously generate TPMs for PT and pathologic lymph nodes (PL) and explored whether structure-level uncertainty in TPMs predicts segmentation model accuracy in an independent external cohort.</div></div><div><h3>Methods</h3><div>We retrospectively gathered PET/CT images and manual delineations of gross tumor volume of the PT (GTVp) and PL (GTVln) of 407 OPC patients treated with (chemo)radiation in our institute. The HECKTOR 2022 challenge dataset served as external test set. The pre-existing architecture was modified for multi-label segmentation. Multiple models were trained, and the non-binarized ensemble average of TPMs was considered per patient. Segmentation accuracy was quantified by surface and aggregate DSC, model uncertainty by coefficient of variation (CV) of multiple predictions.</div></div><div><h3>Results</h3><div>Predicted GTVp and GTVln segmentations in the external test achieved 0.75 and 0.70 aggregate DSC. Patient-specific CV and surface DSC showed a significant correlation for both structures (-0.54 and −0.66 for GTVp and GTVln) in the external set, indicating significant calibration.</div></div><div><h3>Conclusion</h3><div>Significant accuracy versus uncertainty calibration was achieved for TPMs in both internal and external test sets, indicating the potential use of quantified uncertainty from TPMs to identify cases with lower GTVp and GTVln segmentation accuracy, independently of the dataset.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"123 ","pages":"Article 102535"},"PeriodicalIF":5.4000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895611125000448","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

Information on deep learning (DL) tumor segmentation accuracy on a voxel and a structure level is essential for clinical introduction. In a previous study, a DL model was developed for oropharyngeal cancer (OPC) primary tumor (PT) segmentation in PET/CT images and voxel-level predicted probabilities (TPM) quantifying model certainty were introduced. This study extended the network to simultaneously generate TPMs for PT and pathologic lymph nodes (PL) and explored whether structure-level uncertainty in TPMs predicts segmentation model accuracy in an independent external cohort.

Methods

We retrospectively gathered PET/CT images and manual delineations of gross tumor volume of the PT (GTVp) and PL (GTVln) of 407 OPC patients treated with (chemo)radiation in our institute. The HECKTOR 2022 challenge dataset served as external test set. The pre-existing architecture was modified for multi-label segmentation. Multiple models were trained, and the non-binarized ensemble average of TPMs was considered per patient. Segmentation accuracy was quantified by surface and aggregate DSC, model uncertainty by coefficient of variation (CV) of multiple predictions.

Results

Predicted GTVp and GTVln segmentations in the external test achieved 0.75 and 0.70 aggregate DSC. Patient-specific CV and surface DSC showed a significant correlation for both structures (-0.54 and −0.66 for GTVp and GTVln) in the external set, indicating significant calibration.

Conclusion

Significant accuracy versus uncertainty calibration was achieved for TPMs in both internal and external test sets, indicating the potential use of quantified uncertainty from TPMs to identify cases with lower GTVp and GTVln segmentation accuracy, independently of the dataset.

查看原文本刊更多论文

不确定性感知深度学习用于口咽癌原发肿瘤和病理淋巴结的分割：来自多中心队列的见解

目的研究深度学习在体素和结构水平上的肿瘤分割精度对临床应用有重要意义。在之前的研究中，建立了用于PET/CT图像中口咽癌（OPC）原发肿瘤（PT）分割的DL模型，并引入了体素级预测概率（TPM）量化模型确定性。本研究将该网络扩展到同时生成PT和病理淋巴结（PL）的TPMs，并在一个独立的外部队列中探讨TPMs的结构水平不确定性是否能预测分割模型的准确性。方法回顾性收集我院407例OPC（化疗）放疗患者的PET/CT图像和人工划定的PT （GTVp）和PL （GTVln）总肿瘤体积。HECKTOR 2022挑战数据集作为外部测试集。针对多标签分割，对原有架构进行了修改。训练多个模型，并考虑每位患者TPMs的非二值化集合平均值。分割精度由表面和总体DSC量化，模型不确定性由多个预测的变异系数（CV）量化。结果预测的GTVp和GTVln分割在外部测试中分别达到0.75和0.70的总DSC。患者特异性CV和表面DSC在外部组中显示出两种结构的显著相关性（GTVp和GTVln分别为-0.54和- 0.66），表明有意义的校准。结论TPMs在内部和外部测试集中均获得了显著的准确性与不确定度校准，表明TPMs的量化不确定度可用于识别GTVp和GTVln分割精度较低的情况，而不依赖于数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computerized Medical Imaging and Graphics 医学-核医学

CiteScore

10.70

自引率

3.50%

发文量

审稿时长

26 days

期刊介绍： The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.