Zhengshi Chen , Juanjuan He , Xiaoming Liu , Jinshan Tang
{"title":"MDCT-Unet:一种结合多尺度扩展卷积和Transformer的双编码器网络,用于医学图像分割","authors":"Zhengshi Chen , Juanjuan He , Xiaoming Liu , Jinshan Tang","doi":"10.1016/j.bspc.2025.108709","DOIUrl":null,"url":null,"abstract":"<div><div>Precise medical image segmentation is crucial in clinical diagnosis and pathological analysis. Most segmentation methods are based on U-shaped convolutional neural networks (U-Net). Although U-Net performs well in medical image segmentation, as a method based on CNN, its main drawback lies in the difficulty of establishing long-range pixels dependencies and has a constrained receptive field, which restricts segmentation accuracy. Many models address this issue by incorporating Transformer models into U-Net architectures to better capture long-range dependencies. However, these methods often suffer from simple feature fusion techniques and limited receptive fields for local features. To address these challenges, we propose a dual-encoder framework, named MDCT-Unet, which combines Swin-Transformer and CNN for enhanced medical image segmentation. This framework introduces a novel dynamic feature fusion module to better integrate of local and global features. By combining channel and spatial attention mechanisms and inducing competition between them, we enhance the coupling of these two types of features, ensuring richer information representation. In addition, to better extract multi-scale local features from medical images, we design a dilated convolution encoder (DCE) as the CNN branch of our model. By incorporating dilated convolutions with varying receptive fields, DCE captures rich local features at multiple scales, thereby enhancing the model’s ability to segment challenging regions such as boundaries and small organs. We conducted extensive experiments on four datasets: Synapse, ISIC2018, CHASEDB1, and MMWHS. The experimental results show that our method outperforms most current medical image segmentation methods quantitatively and qualitatively.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"112 ","pages":"Article 108709"},"PeriodicalIF":4.9000,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MDCT-Unet: A dual-encoder network combining multi-scale dilated convolutions with Transformer for medical image segmentation\",\"authors\":\"Zhengshi Chen , Juanjuan He , Xiaoming Liu , Jinshan Tang\",\"doi\":\"10.1016/j.bspc.2025.108709\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Precise medical image segmentation is crucial in clinical diagnosis and pathological analysis. Most segmentation methods are based on U-shaped convolutional neural networks (U-Net). Although U-Net performs well in medical image segmentation, as a method based on CNN, its main drawback lies in the difficulty of establishing long-range pixels dependencies and has a constrained receptive field, which restricts segmentation accuracy. Many models address this issue by incorporating Transformer models into U-Net architectures to better capture long-range dependencies. However, these methods often suffer from simple feature fusion techniques and limited receptive fields for local features. To address these challenges, we propose a dual-encoder framework, named MDCT-Unet, which combines Swin-Transformer and CNN for enhanced medical image segmentation. This framework introduces a novel dynamic feature fusion module to better integrate of local and global features. By combining channel and spatial attention mechanisms and inducing competition between them, we enhance the coupling of these two types of features, ensuring richer information representation. In addition, to better extract multi-scale local features from medical images, we design a dilated convolution encoder (DCE) as the CNN branch of our model. By incorporating dilated convolutions with varying receptive fields, DCE captures rich local features at multiple scales, thereby enhancing the model’s ability to segment challenging regions such as boundaries and small organs. We conducted extensive experiments on four datasets: Synapse, ISIC2018, CHASEDB1, and MMWHS. The experimental results show that our method outperforms most current medical image segmentation methods quantitatively and qualitatively.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"112 \",\"pages\":\"Article 108709\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425012200\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425012200","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
MDCT-Unet: A dual-encoder network combining multi-scale dilated convolutions with Transformer for medical image segmentation
Precise medical image segmentation is crucial in clinical diagnosis and pathological analysis. Most segmentation methods are based on U-shaped convolutional neural networks (U-Net). Although U-Net performs well in medical image segmentation, as a method based on CNN, its main drawback lies in the difficulty of establishing long-range pixels dependencies and has a constrained receptive field, which restricts segmentation accuracy. Many models address this issue by incorporating Transformer models into U-Net architectures to better capture long-range dependencies. However, these methods often suffer from simple feature fusion techniques and limited receptive fields for local features. To address these challenges, we propose a dual-encoder framework, named MDCT-Unet, which combines Swin-Transformer and CNN for enhanced medical image segmentation. This framework introduces a novel dynamic feature fusion module to better integrate of local and global features. By combining channel and spatial attention mechanisms and inducing competition between them, we enhance the coupling of these two types of features, ensuring richer information representation. In addition, to better extract multi-scale local features from medical images, we design a dilated convolution encoder (DCE) as the CNN branch of our model. By incorporating dilated convolutions with varying receptive fields, DCE captures rich local features at multiple scales, thereby enhancing the model’s ability to segment challenging regions such as boundaries and small organs. We conducted extensive experiments on four datasets: Synapse, ISIC2018, CHASEDB1, and MMWHS. The experimental results show that our method outperforms most current medical image segmentation methods quantitatively and qualitatively.
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.