DMD：用于医学图像分割的双注意融合和多尺度特征融合解码

IF 2.5 4区综合性期刊 Q2 MULTIDISCIPLINARY SCIENCES

Journal of Radiation Research and Applied Sciences Pub Date : 2025-09-09 DOI:10.1016/j.jrras.2025.101873

Xin Wang, Yanmin Niu

{"title":"DMD：用于医学图像分割的双注意融合和多尺度特征融合解码","authors":"Xin Wang, Yanmin Niu","doi":"10.1016/j.jrras.2025.101873","DOIUrl":null,"url":null,"abstract":"<div><div>High-precision medical image segmentation plays a decisive role in clinical diagnosis, treatment efficacy evaluation, and the development of personalized therapeutic strategies. Although models based on U-Net and its variants have achieved remarkable results across various segmentation tasks, traditional convolutional neural networks (CNNs) are inherently limited by their local receptive fields, restricting their ability to capture long-range dependencies and global contextual information. In contrast, Transformer architectures leverage self-attention mechanisms to model global information, demonstrating superior performance in segmenting complex anatomical structures. However, most existing Transformer-based segmentation frameworks fail to fully integrate multi-scale semantic features extracted by the encoder during the decoding stage, leading to suboptimal feature reconstruction and reduced segmentation accuracy. To address these issues, we propose a novel decoder framework—DMD. First, an Inverted Bottleneck CNN module is introduced in the decoding phase to efficiently capture and enhance the rich semantic features output by the encoder. Second, to resolve the semantic mismatch between encoder and decoder multi-scale features, a Dual Attention Fusion Module (DAFM) is designed, which employs a joint spatial-and-channel attention mechanism to achieve adaptive alignment of cross-scale semantic information. Finally, to mitigate spatial resolution degradation caused by single-scale upsampling, a Multi-scale Feature Fusion Module (MFFM) is constructed to enable deep integration and collaborative optimization of features at different semantic levels. Experimental results demonstrate that the proposed model achieves Dice scores of 82.93%, 91.32%, and 87.85% on the Synapse multi-organ segmentation, ACDC cardiac MRI segmentation, and Kvasir-SEG gastrointestinal endoscopy image segmentation datasets, respectively, outperforming existing methods and verifying the effectiveness and robustness of the framework in complex medical image segmentation tasks.</div></div>","PeriodicalId":16920,"journal":{"name":"Journal of Radiation Research and Applied Sciences","volume":"18 4","pages":"Article 101873"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DMD: Dual attention fusion and multi-scale feature fusion decoding for medical image segmentation\",\"authors\":\"Xin Wang, Yanmin Niu\",\"doi\":\"10.1016/j.jrras.2025.101873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>High-precision medical image segmentation plays a decisive role in clinical diagnosis, treatment efficacy evaluation, and the development of personalized therapeutic strategies. Although models based on U-Net and its variants have achieved remarkable results across various segmentation tasks, traditional convolutional neural networks (CNNs) are inherently limited by their local receptive fields, restricting their ability to capture long-range dependencies and global contextual information. In contrast, Transformer architectures leverage self-attention mechanisms to model global information, demonstrating superior performance in segmenting complex anatomical structures. However, most existing Transformer-based segmentation frameworks fail to fully integrate multi-scale semantic features extracted by the encoder during the decoding stage, leading to suboptimal feature reconstruction and reduced segmentation accuracy. To address these issues, we propose a novel decoder framework—DMD. First, an Inverted Bottleneck CNN module is introduced in the decoding phase to efficiently capture and enhance the rich semantic features output by the encoder. Second, to resolve the semantic mismatch between encoder and decoder multi-scale features, a Dual Attention Fusion Module (DAFM) is designed, which employs a joint spatial-and-channel attention mechanism to achieve adaptive alignment of cross-scale semantic information. Finally, to mitigate spatial resolution degradation caused by single-scale upsampling, a Multi-scale Feature Fusion Module (MFFM) is constructed to enable deep integration and collaborative optimization of features at different semantic levels. Experimental results demonstrate that the proposed model achieves Dice scores of 82.93%, 91.32%, and 87.85% on the Synapse multi-organ segmentation, ACDC cardiac MRI segmentation, and Kvasir-SEG gastrointestinal endoscopy image segmentation datasets, respectively, outperforming existing methods and verifying the effectiveness and robustness of the framework in complex medical image segmentation tasks.</div></div>\",\"PeriodicalId\":16920,\"journal\":{\"name\":\"Journal of Radiation Research and Applied Sciences\",\"volume\":\"18 4\",\"pages\":\"Article 101873\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Radiation Research and Applied Sciences\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1687850725005850\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Radiation Research and Applied Sciences","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1687850725005850","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

高精度医学图像分割在临床诊断、治疗效果评价、制定个性化治疗策略等方面具有举足轻重的作用。尽管基于U-Net及其变体的模型在各种分割任务中取得了显着的结果，但传统卷积神经网络（cnn）固有地受到其局部接受域的限制，限制了其捕获远程依赖关系和全局上下文信息的能力。相比之下，Transformer架构利用自关注机制来建模全局信息，在分割复杂解剖结构方面表现出优越的性能。然而，大多数现有的基于transformer的分割框架未能充分整合编码器在解码阶段提取的多尺度语义特征，导致特征重构不理想，分割精度降低。为了解决这些问题，我们提出了一种新的解码器框架- dmd。首先，在解码阶段引入倒瓶颈CNN模块，有效捕获和增强编码器输出的丰富语义特征。其次，为解决编码器和解码器多尺度特征之间的语义不匹配问题，设计了双注意融合模块（Dual Attention Fusion Module， DAFM），该模块采用空间和信道联合注意机制实现跨尺度语义信息的自适应对齐；最后，为了缓解单尺度上采样导致的空间分辨率下降，构建了多尺度特征融合模块（MFFM），实现了不同语义层次特征的深度融合和协同优化。实验结果表明，该模型在Synapse多器官分割、ACDC心脏MRI分割和Kvasir-SEG胃肠内镜图像分割数据集上的Dice得分分别为82.93%、91.32%和87.85%，优于现有方法，验证了该框架在复杂医学图像分割任务中的有效性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DMD: Dual attention fusion and multi-scale feature fusion decoding for medical image segmentation

High-precision medical image segmentation plays a decisive role in clinical diagnosis, treatment efficacy evaluation, and the development of personalized therapeutic strategies. Although models based on U-Net and its variants have achieved remarkable results across various segmentation tasks, traditional convolutional neural networks (CNNs) are inherently limited by their local receptive fields, restricting their ability to capture long-range dependencies and global contextual information. In contrast, Transformer architectures leverage self-attention mechanisms to model global information, demonstrating superior performance in segmenting complex anatomical structures. However, most existing Transformer-based segmentation frameworks fail to fully integrate multi-scale semantic features extracted by the encoder during the decoding stage, leading to suboptimal feature reconstruction and reduced segmentation accuracy. To address these issues, we propose a novel decoder framework—DMD. First, an Inverted Bottleneck CNN module is introduced in the decoding phase to efficiently capture and enhance the rich semantic features output by the encoder. Second, to resolve the semantic mismatch between encoder and decoder multi-scale features, a Dual Attention Fusion Module (DAFM) is designed, which employs a joint spatial-and-channel attention mechanism to achieve adaptive alignment of cross-scale semantic information. Finally, to mitigate spatial resolution degradation caused by single-scale upsampling, a Multi-scale Feature Fusion Module (MFFM) is constructed to enable deep integration and collaborative optimization of features at different semantic levels. Experimental results demonstrate that the proposed model achieves Dice scores of 82.93%, 91.32%, and 87.85% on the Synapse multi-organ segmentation, ACDC cardiac MRI segmentation, and Kvasir-SEG gastrointestinal endoscopy image segmentation datasets, respectively, outperforming existing methods and verifying the effectiveness and robustness of the framework in complex medical image segmentation tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Radiation Research and Applied Sciences MULTIDISCIPLINARY SCIENCES-

自引率

5.90%

发文量

130

审稿时长

16 weeks

期刊介绍： Journal of Radiation Research and Applied Sciences provides a high quality medium for the publication of substantial, original and scientific and technological papers on the development and applications of nuclear, radiation and isotopes in biology, medicine, drugs, biochemistry, microbiology, agriculture, entomology, food technology, chemistry, physics, solid states, engineering, environmental and applied sciences.