A full-scale attention-augmented CNN-transformer model for segmentation of oropharyngeal mucosa organs-at-risk in radiotherapy.

IF 2 4区 医学 Q3 ENGINEERING, BIOMEDICAL
Lian He, Jianda Sun, Shanfu Lu, Jingyang Li, Xiaoqing Wang, Ziye Yan, Jian Guan
{"title":"A full-scale attention-augmented CNN-transformer model for segmentation of oropharyngeal mucosa organs-at-risk in radiotherapy.","authors":"Lian He, Jianda Sun, Shanfu Lu, Jingyang Li, Xiaoqing Wang, Ziye Yan, Jian Guan","doi":"10.1007/s13246-025-01614-1","DOIUrl":null,"url":null,"abstract":"<p><p>Radiation-induced oropharyngeal mucositis (ROM) is a common and severe side effect of radiotherapy in nasopharyngeal cancer patients, leading to significant clinical complications such as malnutrition, infections, and treatment interruptions. Accurate delineation of the oropharyngeal mucosa (OPM) as an organ-at-risk (OAR) is crucial to minimizing radiation exposure and preventing ROM. This study aims to develop and validate an advanced automatic segmentation model, attention-augmented Swin U-Net transformer (AA-Swin UNETR), for accurate delineation of OPM to improve radiotherapy planning and reduce the incidence of ROM. We proposed a hybrid CNN-transformer model, AA-Swin UNETR, based on the Swin UNETR framework, which integrates hierarchical feature extraction with full-scale attention mechanisms. The model includes a Swin Transformer-based encoder and a CNN-based decoder with residual blocks, connected via a full-scale feature connection scheme. The full-scale attention mechanism enables the model to capture long-range dependencies and multi-level features effectively, enhancing the segmentation accuracy. The model was trained on a dataset of 202 CT scans from Nanfang Hospital, using expert manual delineations as the gold standard. We evaluated the performance of AA-Swin UNETR against state-of-the-art (SOTA) segmentation models, including Swin UNETR, nnUNet, and 3D UX-Net, using geometric and dosimetric evaluation parameters. The geometric metrics include Dice similarity coefficient (DSC), surface DSC (sDSC), volume similarity (VS), Hausdorff distance (HD), precision, and recall. The dosimetric metrics include changes of D<sub>0.1 cc</sub> and D<sub>mean</sub> between results derived from manually delineated OPM and auto-segmentation models. The AA-Swin UNETR model achieved the highest mean DSC of 87.72 ± 1.98%, significantly outperforming Swin UNETR (83.53 ± 2.59%), nnUNet (85.48%± 2.68), and 3D UX-Net (80.04 ± 3.76%). The model also showed superior mean sDSC (98.44 ± 1.08%), mean VS (97.86 ± 1.43%), mean precision (87.60 ± 3.06%) and mean recall (89.22 ± 2.70%), with a competitive mean HD of 9.03 ± 2.79 mm. For dosimetric evaluation, the proposed model generates smallest mean [Formula: see text] (0.46 ± 4.92 cGy) and mean [Formula: see text] (6.26 ± 24.90 cGY) relative to manual delineation compared with other auto-segmentation results (mean [Formula: see text] of Swin UNETR = -0.56 ± 7.28 cGy, nnUNet = 0.99 ± 4.73 cGy, 3D UX-Net = -0.65 ± 8.05 cGy; mean [Formula: see text] of Swin UNETR = 7.46 ± 43.37, nnUNet = 21.76 ± 37.86 and 3D UX-Net = 44.61 ± 62.33). In this paper, we proposed a transformer and CNN hybrid deep-learning based model AA-Swin UNETR for automatic segmentation of OPM as an OAR structure in radiotherapy planning. Evaluations with geometric and dosimetric parameters demonstrated AA-Swin UNETR can generate delineations close to a manual reference, both in terms of geometry and dose-volume metrics. The proposed model out-performed existing SOTA models in both evaluation metrics and demonstrated is capability of accurately segmenting complex anatomical structures of the OPM, providing a reliable tool for enhancing radiotherapy planning.</p>","PeriodicalId":48490,"journal":{"name":"Physical and Engineering Sciences in Medicine","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical and Engineering Sciences in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s13246-025-01614-1","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Radiation-induced oropharyngeal mucositis (ROM) is a common and severe side effect of radiotherapy in nasopharyngeal cancer patients, leading to significant clinical complications such as malnutrition, infections, and treatment interruptions. Accurate delineation of the oropharyngeal mucosa (OPM) as an organ-at-risk (OAR) is crucial to minimizing radiation exposure and preventing ROM. This study aims to develop and validate an advanced automatic segmentation model, attention-augmented Swin U-Net transformer (AA-Swin UNETR), for accurate delineation of OPM to improve radiotherapy planning and reduce the incidence of ROM. We proposed a hybrid CNN-transformer model, AA-Swin UNETR, based on the Swin UNETR framework, which integrates hierarchical feature extraction with full-scale attention mechanisms. The model includes a Swin Transformer-based encoder and a CNN-based decoder with residual blocks, connected via a full-scale feature connection scheme. The full-scale attention mechanism enables the model to capture long-range dependencies and multi-level features effectively, enhancing the segmentation accuracy. The model was trained on a dataset of 202 CT scans from Nanfang Hospital, using expert manual delineations as the gold standard. We evaluated the performance of AA-Swin UNETR against state-of-the-art (SOTA) segmentation models, including Swin UNETR, nnUNet, and 3D UX-Net, using geometric and dosimetric evaluation parameters. The geometric metrics include Dice similarity coefficient (DSC), surface DSC (sDSC), volume similarity (VS), Hausdorff distance (HD), precision, and recall. The dosimetric metrics include changes of D0.1 cc and Dmean between results derived from manually delineated OPM and auto-segmentation models. The AA-Swin UNETR model achieved the highest mean DSC of 87.72 ± 1.98%, significantly outperforming Swin UNETR (83.53 ± 2.59%), nnUNet (85.48%± 2.68), and 3D UX-Net (80.04 ± 3.76%). The model also showed superior mean sDSC (98.44 ± 1.08%), mean VS (97.86 ± 1.43%), mean precision (87.60 ± 3.06%) and mean recall (89.22 ± 2.70%), with a competitive mean HD of 9.03 ± 2.79 mm. For dosimetric evaluation, the proposed model generates smallest mean [Formula: see text] (0.46 ± 4.92 cGy) and mean [Formula: see text] (6.26 ± 24.90 cGY) relative to manual delineation compared with other auto-segmentation results (mean [Formula: see text] of Swin UNETR = -0.56 ± 7.28 cGy, nnUNet = 0.99 ± 4.73 cGy, 3D UX-Net = -0.65 ± 8.05 cGy; mean [Formula: see text] of Swin UNETR = 7.46 ± 43.37, nnUNet = 21.76 ± 37.86 and 3D UX-Net = 44.61 ± 62.33). In this paper, we proposed a transformer and CNN hybrid deep-learning based model AA-Swin UNETR for automatic segmentation of OPM as an OAR structure in radiotherapy planning. Evaluations with geometric and dosimetric parameters demonstrated AA-Swin UNETR can generate delineations close to a manual reference, both in terms of geometry and dose-volume metrics. The proposed model out-performed existing SOTA models in both evaluation metrics and demonstrated is capability of accurately segmenting complex anatomical structures of the OPM, providing a reliable tool for enhancing radiotherapy planning.

用于放疗中口咽粘膜危险器官分割的全尺寸注意力增强CNN-transformer模型。
辐射诱发口咽黏膜炎(ROM)是鼻咽癌放疗患者常见且严重的副作用,可导致严重的临床并发症,如营养不良、感染和治疗中断。准确描绘口咽粘膜(OPM)作为危险器官(OAR)对于减少辐射暴露和预防ROM至关重要。本研究旨在开发和验证一种先进的自动分割模型,即注意力增强Swin U-Net变压器(AA-Swin UNETR),用于准确描绘OPM,以改善放疗计划并降低ROM的发生率。我们提出了一种基于Swin UNETR框架的混合CNN-transformer模型AA-Swin UNETR。它将分层特征提取与全尺度注意机制相结合。该模型包括一个基于Swin变压器的编码器和一个基于cnn的残差块解码器,通过全尺寸特征连接方案连接。全尺度注意机制使模型能够有效地捕捉远程依赖关系和多层次特征,提高了分割精度。该模型在南方医院的202个CT扫描数据集上进行训练,使用专家手动划定作为金标准。我们使用几何和剂量学评估参数,对最先进的(SOTA)分割模型(包括Swin UNETR, nnUNet和3D UX-Net)进行了AA-Swin UNETR的性能评估。几何指标包括Dice similarity coefficient (DSC)、surface DSC (sDSC)、volume similarity (VS)、Hausdorff distance (HD)、precision(精密度)和recall(召回率)。剂量学指标包括人工划定的OPM和自动分割模型得出的结果之间D0.1 cc和Dmean的变化。AA-Swin UNETR模型的平均DSC最高,为87.72±1.98%,显著优于Swin UNETR(83.53±2.59%)、nnUNet(85.48%±2.68)和3D UX-Net(80.04±3.76%)。平均sDSC(98.44±1.08%)、平均VS(97.86±1.43%)、平均精密度(87.60±3.06%)和平均召回率(89.22±2.70%)均优于模型,平均高清(HD)为9.03±2.79 mm。最小剂量测定的评价,该模型生成的意思是[公式:看到文本](0.46±4.92 cGy),意思是[公式:看到文本](6.26±24.90 cGy)相对于手动描述与其他auto-segmentation相比结果(意味着[公式:看到文本]斯温UNETR = -0.56±7.28 cGy nnUNet cGy = 0.99±4.73,3 d UX-Net = -0.65±8.05 cGy;意思是[公式:看到文本]斯温UNETR = 7.46±43.37,nnUNet = 21.76±37.86和3 d UX-Net = 44.61±62.33)。本文提出了一种基于transformer和CNN混合深度学习的模型AA-Swin UNETR,用于OPM的自动分割,作为放疗规划中的桨结构。利用几何和剂量学参数进行的评价表明,在几何和剂量-体积指标方面,AA-Swin UNETR可以产生接近人工参考的圈定。所提出的模型在两个评估指标上都优于现有的SOTA模型,并证明了其准确分割OPM复杂解剖结构的能力,为加强放疗计划提供了可靠的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.40
自引率
4.50%
发文量
110
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信