USCT-UNet：利用多通道融合转换器从 U 形跳接重新思考 U-Net 网络中的语义差距

IF 4.8 2区医学 Q2 ENGINEERING, BIOMEDICAL

IEEE Transactions on Neural Systems and Rehabilitation Engineering Pub Date : 2024-09-26 DOI:10.1109/TNSRE.2024.3468339

Xiaoshan Xie;Min Yang

{"title":"USCT-UNet：利用多通道融合转换器从 U 形跳接重新思考 U-Net 网络中的语义差距","authors":"Xiaoshan Xie;Min Yang","doi":"10.1109/TNSRE.2024.3468339","DOIUrl":null,"url":null,"abstract":"Medical image segmentation is a crucial component of computer-aided clinical diagnosis, with state-of-the-art models often being variants of U-Net. Despite their success, these models’ skip connections introduce an unnecessary semantic gap between the encoder and decoder, which hinders their ability to achieve the high precision required for clinical applications. Awareness of this semantic gap and its detrimental influences have increased over time. However, a quantitative understanding of how this semantic gap compromises accuracy and reliability remains lacking, emphasizing the need for effective mitigation strategies. In response, we present the first quantitative evaluation of the semantic gap between corresponding layers of U-Net and identify two key characteristics: 1) The direct skip connection (DSC) exhibits a semantic gap that negatively impacts models’ performance; 2) The magnitude of the semantic gap varies across different layers. Based on these findings, we re-examine this issue through the lens of skip connections. We introduce a Multichannel Fusion Transformer (MCFT) and propose a novel USCT-UNet architecture, which incorporates U-shaped skip connections (USC) to replace DSC, allocates varying numbers of MCFT blocks based on the semantic gap magnitude at different layers, and employs a spatial channel cross-attention (SCCA) module to facilitate the fusion of features between the decoder and USC. We evaluate USCT-UNet on four challenging datasets, and the results demonstrate that it effectively eliminates the semantic gap. Compared to using DSC, our USC and SCCA strategies achieve maximum improvements of 4.79% in the Dice coefficient, 5.70% in mean intersection over union (MIoU), and 3.26 in Hausdorff distance.","PeriodicalId":13419,"journal":{"name":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","volume":"32 ","pages":"3782-3793"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10695464","citationCount":"0","resultStr":"{\"title\":\"USCT-UNet: Rethinking the Semantic Gap in U-Net Network From U-Shaped Skip Connections With Multichannel Fusion Transformer\",\"authors\":\"Xiaoshan Xie;Min Yang\",\"doi\":\"10.1109/TNSRE.2024.3468339\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Medical image segmentation is a crucial component of computer-aided clinical diagnosis, with state-of-the-art models often being variants of U-Net. Despite their success, these models’ skip connections introduce an unnecessary semantic gap between the encoder and decoder, which hinders their ability to achieve the high precision required for clinical applications. Awareness of this semantic gap and its detrimental influences have increased over time. However, a quantitative understanding of how this semantic gap compromises accuracy and reliability remains lacking, emphasizing the need for effective mitigation strategies. In response, we present the first quantitative evaluation of the semantic gap between corresponding layers of U-Net and identify two key characteristics: 1) The direct skip connection (DSC) exhibits a semantic gap that negatively impacts models’ performance; 2) The magnitude of the semantic gap varies across different layers. Based on these findings, we re-examine this issue through the lens of skip connections. We introduce a Multichannel Fusion Transformer (MCFT) and propose a novel USCT-UNet architecture, which incorporates U-shaped skip connections (USC) to replace DSC, allocates varying numbers of MCFT blocks based on the semantic gap magnitude at different layers, and employs a spatial channel cross-attention (SCCA) module to facilitate the fusion of features between the decoder and USC. We evaluate USCT-UNet on four challenging datasets, and the results demonstrate that it effectively eliminates the semantic gap. Compared to using DSC, our USC and SCCA strategies achieve maximum improvements of 4.79% in the Dice coefficient, 5.70% in mean intersection over union (MIoU), and 3.26 in Hausdorff distance.\",\"PeriodicalId\":13419,\"journal\":{\"name\":\"IEEE Transactions on Neural Systems and Rehabilitation Engineering\",\"volume\":\"32 \",\"pages\":\"3782-3793\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10695464\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Neural Systems and Rehabilitation Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10695464/\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10695464/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

医学图像分割是计算机辅助临床诊断的重要组成部分，最先进的模型通常是 U-Net 的变体。尽管这些模型很成功，但它们的跳接在编码器和解码器之间引入了不必要的语义鸿沟，这阻碍了它们实现临床应用所需的高精度的能力。随着时间的推移，人们对这种语义间隙及其有害影响的认识也在不断提高。然而，人们对这种语义鸿沟如何影响准确性和可靠性仍缺乏定量了解，这就强调了对有效缓解策略的需求。为此，我们首次对 U-Net 相应层之间的语义差距进行了定量评估，并确定了两个关键特征：1）直接跳过连接（DSC）表现出语义间隙，对模型的性能产生负面影响；2）不同层之间语义间隙的大小各不相同。基于这些发现，我们从跳转连接的角度重新审视了这一问题。我们引入了多通道融合转换器（Multichannel Fusion Transformer，MCFT），并提出了一种新颖的 USCT-UNet 架构，该架构采用 U 形跳接（USC）来取代 DSC，根据不同层的语义差距大小分配不同数量的 MCFT 块，并采用空间通道交叉注意（SCCA）模块来促进解码器和 USC 之间的特征融合。我们在四个具有挑战性的数据集上对 USCT-UNet 进行了评估，结果表明它能有效消除语义间隙。与使用 DSC 相比，我们的 USC 和 SCCA 策略在 Dice 系数方面实现了 4.79% 的最大改进，在平均交集大于联合（MIoU）方面实现了 5.70% 的改进，在 Hausdorff 距离方面实现了 3.26% 的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

USCT-UNet: Rethinking the Semantic Gap in U-Net Network From U-Shaped Skip Connections With Multichannel Fusion Transformer

Medical image segmentation is a crucial component of computer-aided clinical diagnosis, with state-of-the-art models often being variants of U-Net. Despite their success, these models’ skip connections introduce an unnecessary semantic gap between the encoder and decoder, which hinders their ability to achieve the high precision required for clinical applications. Awareness of this semantic gap and its detrimental influences have increased over time. However, a quantitative understanding of how this semantic gap compromises accuracy and reliability remains lacking, emphasizing the need for effective mitigation strategies. In response, we present the first quantitative evaluation of the semantic gap between corresponding layers of U-Net and identify two key characteristics: 1) The direct skip connection (DSC) exhibits a semantic gap that negatively impacts models’ performance; 2) The magnitude of the semantic gap varies across different layers. Based on these findings, we re-examine this issue through the lens of skip connections. We introduce a Multichannel Fusion Transformer (MCFT) and propose a novel USCT-UNet architecture, which incorporates U-shaped skip connections (USC) to replace DSC, allocates varying numbers of MCFT blocks based on the semantic gap magnitude at different layers, and employs a spatial channel cross-attention (SCCA) module to facilitate the fusion of features between the decoder and USC. We evaluate USCT-UNet on four challenging datasets, and the results demonstrate that it effectively eliminates the semantic gap. Compared to using DSC, our USC and SCCA strategies achieve maximum improvements of 4.79% in the Dice coefficient, 5.70% in mean intersection over union (MIoU), and 3.26 in Hausdorff distance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Neural Systems and Rehabilitation Engineering 医学-工程：生物医学

CiteScore

8.60

自引率

8.20%

发文量

479

审稿时长

6-12 weeks

期刊介绍： Rehabilitative and neural aspects of biomedical engineering, including functional electrical stimulation, acoustic dynamics, human performance measurement and analysis, nerve stimulation, electromyography, motor control and stimulation; and hardware and software applications for rehabilitation engineering and assistive devices.