基于cnn -transformer的混合u形模型及远程中继用于食道三维CT图像大体肿瘤体积分割。

Medical physics Pub Date : 2025-04-14 DOI:10.1002/mp.17818

Songli Yu, Yunxiang Li, Pengfei Jiao, Yixiu Liu, Jianxiang Zhao, Chenggang Yan, Qifeng Wang, Shuai Wang

{"title":"基于cnn -transformer的混合u形模型及远程中继用于食道三维CT图像大体肿瘤体积分割。","authors":"Songli Yu, Yunxiang Li, Pengfei Jiao, Yixiu Liu, Jianxiang Zhao, Chenggang Yan, Qifeng Wang, Shuai Wang","doi":"10.1002/mp.17818","DOIUrl":null,"url":null,"abstract":"Background: Accurate and reliable segmentation of esophageal gross tumor volume (GTV) in computed tomography (CT) is beneficial for diagnosing and treating. However, this remains a challenging task because the esophagus has a variable shape and extensive vertical range, resulting in tumors potentially appearing at any position within it.Purpose: This study introduces a novel CNN-transformer-based U-shape model (LRRM-U-TransNet) designed to enhance the segmentation accuracy of esophageal GTV. By leveraging advanced deep learning techniques, we aim to address the challenges posed by the variable shape and extensive range of the esophagus, ultimately improving diagnostic and treatment outcomes.Methods: Specifically, we propose a long-range relay mechanism to converge all layer feature information by progressively passing adjacent layer feature maps in the pixel and semantic pathways. Moreover, we propose two ready-to-use blocks to implement this mechanism concretely. The Dual FastViT block interacts with feature maps from two paths to enhance feature representation capabilities. The Dual AxialViT block acts as a secondary auxiliary bottleneck to acquire global information for more precise feature map reconstruction.Results: We build a new esophageal tumor dataset with 1665 real-world patient CT samples annotated by five expert radiologists and employ multiple evaluation metrics to validate our model. Results of a five-fold cross-validation on this dataset show that LRRM-U-TransNet achieves a Dice coefficient of 0.834, a Jaccard coefficient of 0.730, a Precision of 0.840, a HD95 of 3.234 mm, and a Volume Similarity of 0.143.Conclusions: We propose a CNN-Transformer hybrid deep learning network to improve the segmentation effect of esophageal tumors. We utilize the local and global information between shallower and deeper layers to prevent early information loss and enhance the cross-layer communication. To validate our model, we collect a dataset composed of 1665 CT images of esophageal tumors from Sichuan Tumor Hospital. The results show that our model outperforms the state-of-the-art models. It is of great significance to improve the accuracy and clinical application of esophageal tumor segmentation.","PeriodicalId":94136,"journal":{"name":"Medical physics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A CNN-transformer-based hybrid U-shape model with long-range relay for esophagus 3D CT image gross tumor volume segmentation.\",\"authors\":\"Songli Yu, Yunxiang Li, Pengfei Jiao, Yixiu Liu, Jianxiang Zhao, Chenggang Yan, Qifeng Wang, Shuai Wang\",\"doi\":\"10.1002/mp.17818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Accurate and reliable segmentation of esophageal gross tumor volume (GTV) in computed tomography (CT) is beneficial for diagnosing and treating. However, this remains a challenging task because the esophagus has a variable shape and extensive vertical range, resulting in tumors potentially appearing at any position within it.Purpose: This study introduces a novel CNN-transformer-based U-shape model (LRRM-U-TransNet) designed to enhance the segmentation accuracy of esophageal GTV. By leveraging advanced deep learning techniques, we aim to address the challenges posed by the variable shape and extensive range of the esophagus, ultimately improving diagnostic and treatment outcomes.Methods: Specifically, we propose a long-range relay mechanism to converge all layer feature information by progressively passing adjacent layer feature maps in the pixel and semantic pathways. Moreover, we propose two ready-to-use blocks to implement this mechanism concretely. The Dual FastViT block interacts with feature maps from two paths to enhance feature representation capabilities. The Dual AxialViT block acts as a secondary auxiliary bottleneck to acquire global information for more precise feature map reconstruction.Results: We build a new esophageal tumor dataset with 1665 real-world patient CT samples annotated by five expert radiologists and employ multiple evaluation metrics to validate our model. Results of a five-fold cross-validation on this dataset show that LRRM-U-TransNet achieves a Dice coefficient of 0.834, a Jaccard coefficient of 0.730, a Precision of 0.840, a HD95 of 3.234 mm, and a Volume Similarity of 0.143.Conclusions: We propose a CNN-Transformer hybrid deep learning network to improve the segmentation effect of esophageal tumors. We utilize the local and global information between shallower and deeper layers to prevent early information loss and enhance the cross-layer communication. To validate our model, we collect a dataset composed of 1665 CT images of esophageal tumors from Sichuan Tumor Hospital. The results show that our model outperforms the state-of-the-art models. It is of great significance to improve the accuracy and clinical application of esophageal tumor segmentation.\",\"PeriodicalId\":94136,\"journal\":{\"name\":\"Medical physics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical physics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/mp.17818\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/mp.17818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景：在计算机断层扫描（CT）中准确可靠地分割食管总肿瘤体积（GTV）有助于诊断和治疗。然而，这仍然是一项具有挑战性的任务，因为食道具有可变的形状和广泛的垂直范围，导致肿瘤可能出现在食道的任何位置。目的：介绍一种基于cnn -transformer的新型u形模型（LRRM-U-TransNet），旨在提高食管GTV的分割精度。通过利用先进的深度学习技术，我们的目标是解决食道形状多变和范围广泛带来的挑战，最终改善诊断和治疗结果。具体而言，我们提出了一种远程中继机制，通过在像素和语义路径上逐步传递相邻层特征映射来收敛所有层特征信息。此外，我们提出了两个现成的块来具体实现这一机制。Dual FastViT块从两个路径与特征映射交互，以增强特征表示能力。Dual AxialViT块作为次要的辅助瓶颈来获取全局信息，以便更精确地重建特征图。结果：我们建立了一个新的食管肿瘤数据集，其中有1665个真实世界的患者CT样本，由5位放射科专家注释，并采用多种评估指标来验证我们的模型。对该数据集进行五重交叉验证的结果表明，LRRM-U-TransNet的Dice系数为0.834，Jaccard系数为0.730，Precision为0.840，HD95为3.234 mm， Volume Similarity为0.143。结论：我们提出了一种CNN-Transformer混合深度学习网络，以提高食管肿瘤的分割效果。我们利用浅层和深层之间的局部和全局信息，防止早期信息丢失，增强跨层通信。为了验证我们的模型，我们收集了由四川省肿瘤医院1665张食管肿瘤CT图像组成的数据集。结果表明，我们的模型优于最先进的模型。对提高食管肿瘤分割的准确性和临床应用具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A CNN-transformer-based hybrid U-shape model with long-range relay for esophagus 3D CT image gross tumor volume segmentation.

Background: Accurate and reliable segmentation of esophageal gross tumor volume (GTV) in computed tomography (CT) is beneficial for diagnosing and treating. However, this remains a challenging task because the esophagus has a variable shape and extensive vertical range, resulting in tumors potentially appearing at any position within it.

Purpose: This study introduces a novel CNN-transformer-based U-shape model (LRRM-U-TransNet) designed to enhance the segmentation accuracy of esophageal GTV. By leveraging advanced deep learning techniques, we aim to address the challenges posed by the variable shape and extensive range of the esophagus, ultimately improving diagnostic and treatment outcomes.

Methods: Specifically, we propose a long-range relay mechanism to converge all layer feature information by progressively passing adjacent layer feature maps in the pixel and semantic pathways. Moreover, we propose two ready-to-use blocks to implement this mechanism concretely. The Dual FastViT block interacts with feature maps from two paths to enhance feature representation capabilities. The Dual AxialViT block acts as a secondary auxiliary bottleneck to acquire global information for more precise feature map reconstruction.

Results: We build a new esophageal tumor dataset with 1665 real-world patient CT samples annotated by five expert radiologists and employ multiple evaluation metrics to validate our model. Results of a five-fold cross-validation on this dataset show that LRRM-U-TransNet achieves a Dice coefficient of 0.834, a Jaccard coefficient of 0.730, a Precision of 0.840, a HD95 of 3.234 mm, and a Volume Similarity of 0.143.

Conclusions: We propose a CNN-Transformer hybrid deep learning network to improve the segmentation effect of esophageal tumors. We utilize the local and global information between shallower and deeper layers to prevent early information loss and enhance the cross-layer communication. To validate our model, we collect a dataset composed of 1665 CT images of esophageal tumors from Sichuan Tumor Hospital. The results show that our model outperforms the state-of-the-art models. It is of great significance to improve the accuracy and clinical application of esophageal tumor segmentation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical physics

自引率

0.00%

发文量