{"title":"DMD:用于医学图像分割的双注意融合和多尺度特征融合解码","authors":"Xin Wang, Yanmin Niu","doi":"10.1016/j.jrras.2025.101873","DOIUrl":null,"url":null,"abstract":"<div><div>High-precision medical image segmentation plays a decisive role in clinical diagnosis, treatment efficacy evaluation, and the development of personalized therapeutic strategies. Although models based on U-Net and its variants have achieved remarkable results across various segmentation tasks, traditional convolutional neural networks (CNNs) are inherently limited by their local receptive fields, restricting their ability to capture long-range dependencies and global contextual information. In contrast, Transformer architectures leverage self-attention mechanisms to model global information, demonstrating superior performance in segmenting complex anatomical structures. However, most existing Transformer-based segmentation frameworks fail to fully integrate multi-scale semantic features extracted by the encoder during the decoding stage, leading to suboptimal feature reconstruction and reduced segmentation accuracy. To address these issues, we propose a novel decoder framework—DMD. First, an Inverted Bottleneck CNN module is introduced in the decoding phase to efficiently capture and enhance the rich semantic features output by the encoder. Second, to resolve the semantic mismatch between encoder and decoder multi-scale features, a Dual Attention Fusion Module (DAFM) is designed, which employs a joint spatial-and-channel attention mechanism to achieve adaptive alignment of cross-scale semantic information. Finally, to mitigate spatial resolution degradation caused by single-scale upsampling, a Multi-scale Feature Fusion Module (MFFM) is constructed to enable deep integration and collaborative optimization of features at different semantic levels. Experimental results demonstrate that the proposed model achieves Dice scores of 82.93%, 91.32%, and 87.85% on the Synapse multi-organ segmentation, ACDC cardiac MRI segmentation, and Kvasir-SEG gastrointestinal endoscopy image segmentation datasets, respectively, outperforming existing methods and verifying the effectiveness and robustness of the framework in complex medical image segmentation tasks.</div></div>","PeriodicalId":16920,"journal":{"name":"Journal of Radiation Research and Applied Sciences","volume":"18 4","pages":"Article 101873"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DMD: Dual attention fusion and multi-scale feature fusion decoding for medical image segmentation\",\"authors\":\"Xin Wang, Yanmin Niu\",\"doi\":\"10.1016/j.jrras.2025.101873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>High-precision medical image segmentation plays a decisive role in clinical diagnosis, treatment efficacy evaluation, and the development of personalized therapeutic strategies. Although models based on U-Net and its variants have achieved remarkable results across various segmentation tasks, traditional convolutional neural networks (CNNs) are inherently limited by their local receptive fields, restricting their ability to capture long-range dependencies and global contextual information. In contrast, Transformer architectures leverage self-attention mechanisms to model global information, demonstrating superior performance in segmenting complex anatomical structures. However, most existing Transformer-based segmentation frameworks fail to fully integrate multi-scale semantic features extracted by the encoder during the decoding stage, leading to suboptimal feature reconstruction and reduced segmentation accuracy. To address these issues, we propose a novel decoder framework—DMD. First, an Inverted Bottleneck CNN module is introduced in the decoding phase to efficiently capture and enhance the rich semantic features output by the encoder. Second, to resolve the semantic mismatch between encoder and decoder multi-scale features, a Dual Attention Fusion Module (DAFM) is designed, which employs a joint spatial-and-channel attention mechanism to achieve adaptive alignment of cross-scale semantic information. Finally, to mitigate spatial resolution degradation caused by single-scale upsampling, a Multi-scale Feature Fusion Module (MFFM) is constructed to enable deep integration and collaborative optimization of features at different semantic levels. Experimental results demonstrate that the proposed model achieves Dice scores of 82.93%, 91.32%, and 87.85% on the Synapse multi-organ segmentation, ACDC cardiac MRI segmentation, and Kvasir-SEG gastrointestinal endoscopy image segmentation datasets, respectively, outperforming existing methods and verifying the effectiveness and robustness of the framework in complex medical image segmentation tasks.</div></div>\",\"PeriodicalId\":16920,\"journal\":{\"name\":\"Journal of Radiation Research and Applied Sciences\",\"volume\":\"18 4\",\"pages\":\"Article 101873\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Radiation Research and Applied Sciences\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1687850725005850\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Radiation Research and Applied Sciences","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1687850725005850","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
DMD: Dual attention fusion and multi-scale feature fusion decoding for medical image segmentation
High-precision medical image segmentation plays a decisive role in clinical diagnosis, treatment efficacy evaluation, and the development of personalized therapeutic strategies. Although models based on U-Net and its variants have achieved remarkable results across various segmentation tasks, traditional convolutional neural networks (CNNs) are inherently limited by their local receptive fields, restricting their ability to capture long-range dependencies and global contextual information. In contrast, Transformer architectures leverage self-attention mechanisms to model global information, demonstrating superior performance in segmenting complex anatomical structures. However, most existing Transformer-based segmentation frameworks fail to fully integrate multi-scale semantic features extracted by the encoder during the decoding stage, leading to suboptimal feature reconstruction and reduced segmentation accuracy. To address these issues, we propose a novel decoder framework—DMD. First, an Inverted Bottleneck CNN module is introduced in the decoding phase to efficiently capture and enhance the rich semantic features output by the encoder. Second, to resolve the semantic mismatch between encoder and decoder multi-scale features, a Dual Attention Fusion Module (DAFM) is designed, which employs a joint spatial-and-channel attention mechanism to achieve adaptive alignment of cross-scale semantic information. Finally, to mitigate spatial resolution degradation caused by single-scale upsampling, a Multi-scale Feature Fusion Module (MFFM) is constructed to enable deep integration and collaborative optimization of features at different semantic levels. Experimental results demonstrate that the proposed model achieves Dice scores of 82.93%, 91.32%, and 87.85% on the Synapse multi-organ segmentation, ACDC cardiac MRI segmentation, and Kvasir-SEG gastrointestinal endoscopy image segmentation datasets, respectively, outperforming existing methods and verifying the effectiveness and robustness of the framework in complex medical image segmentation tasks.
期刊介绍:
Journal of Radiation Research and Applied Sciences provides a high quality medium for the publication of substantial, original and scientific and technological papers on the development and applications of nuclear, radiation and isotopes in biology, medicine, drugs, biochemistry, microbiology, agriculture, entomology, food technology, chemistry, physics, solid states, engineering, environmental and applied sciences.