D-Net：基于动态特征融合的动态大核体医学图像分割

IF 4.9 2区医学 Q1 ENGINEERING, BIOMEDICAL

Biomedical Signal Processing and Control Pub Date : 2025-10-10 DOI:10.1016/j.bspc.2025.108837

Jin Yang , Peijie Qiu , Yichi Zhang , Daniel S. Marcus , Aristeidis Sotiras

{"title":"D-Net：基于动态特征融合的动态大核体医学图像分割","authors":"Jin Yang , Peijie Qiu , Yichi Zhang , Daniel S. Marcus , Aristeidis Sotiras","doi":"10.1016/j.bspc.2025.108837","DOIUrl":null,"url":null,"abstract":"<div><div>Hierarchical Vision Transformers (ViTs) have achieved significant success in medical image segmentation due to their large receptive field and ability to leverage long-range contextual information. Convolutional neural networks (CNNs) may also deliver a large receptive field by using large convolutional kernels. However, because they use fixed-sized kernels, CNNs with large kernels remain limited in their ability to adaptively capture multi-scale features from organs that vary greatly in shape and size. They are also unable to utilize global contextual information efficiently. To address these limitations, we propose lightweight Dynamic Large Kernel (DLK) and Dynamic Feature Fusion (DFF) modules. The DLK employs multiple large kernels with varying kernel sizes and dilation rates to capture multi-scale features. Subsequently, DLK utilizes a dynamic selection mechanism to adaptively highlight the most important channel and spatial features based on global information. The DFF is proposed to adaptively fuse multi-scale local feature maps based on their global information. We incorporated DLK and DFF into a hierarchical ViT architecture to leverage their scaling behavior, but they struggle to extract low-level features effectively due to feature embedding constraints in ViT architectures. To tackle this limitation, we propose a Salience layer to extract low-level features from images at their original dimensions without feature embedding. This Salience layer employs a Channel Mixer to capture global representations effectively. We further incorporated the Salience layer into the hierarchical ViT architecture to develop a novel network, termed D-Net. D-Net effectively utilizes a multi-scale large receptive field and adaptively harnesses global contextual information. Extensive experimental results demonstrate its superior segmentation performance compared to state-of-the-art models, with comparably lower computational complexity. The code is made available at <span><span>https://github.com/sotiraslab/DLK</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"113 ","pages":"Article 108837"},"PeriodicalIF":4.9000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"D-Net: Dynamic large kernel with dynamic feature fusion for volumetric medical image segmentation\",\"authors\":\"Jin Yang , Peijie Qiu , Yichi Zhang , Daniel S. Marcus , Aristeidis Sotiras\",\"doi\":\"10.1016/j.bspc.2025.108837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hierarchical Vision Transformers (ViTs) have achieved significant success in medical image segmentation due to their large receptive field and ability to leverage long-range contextual information. Convolutional neural networks (CNNs) may also deliver a large receptive field by using large convolutional kernels. However, because they use fixed-sized kernels, CNNs with large kernels remain limited in their ability to adaptively capture multi-scale features from organs that vary greatly in shape and size. They are also unable to utilize global contextual information efficiently. To address these limitations, we propose lightweight Dynamic Large Kernel (DLK) and Dynamic Feature Fusion (DFF) modules. The DLK employs multiple large kernels with varying kernel sizes and dilation rates to capture multi-scale features. Subsequently, DLK utilizes a dynamic selection mechanism to adaptively highlight the most important channel and spatial features based on global information. The DFF is proposed to adaptively fuse multi-scale local feature maps based on their global information. We incorporated DLK and DFF into a hierarchical ViT architecture to leverage their scaling behavior, but they struggle to extract low-level features effectively due to feature embedding constraints in ViT architectures. To tackle this limitation, we propose a Salience layer to extract low-level features from images at their original dimensions without feature embedding. This Salience layer employs a Channel Mixer to capture global representations effectively. We further incorporated the Salience layer into the hierarchical ViT architecture to develop a novel network, termed D-Net. D-Net effectively utilizes a multi-scale large receptive field and adaptively harnesses global contextual information. Extensive experimental results demonstrate its superior segmentation performance compared to state-of-the-art models, with comparably lower computational complexity. The code is made available at <span><span>https://github.com/sotiraslab/DLK</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"113 \",\"pages\":\"Article 108837\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425013485\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425013485","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

分层视觉变压器（ViTs）由于其大的接受域和利用远程上下文信息的能力，在医学图像分割中取得了显著的成功。卷积神经网络（cnn）也可以通过使用大卷积核来提供大的接受场。然而，由于它们使用固定大小的核，具有大核的cnn在自适应捕获形状和大小变化很大的器官的多尺度特征方面仍然受到限制。他们也无法有效地利用全局上下文信息。为了解决这些限制，我们提出了轻量级的动态大内核（DLK）和动态特征融合（DFF）模块。DLK采用多个具有不同内核大小和膨胀率的大型内核来捕获多尺度特征。随后，DLK利用动态选择机制，根据全局信息自适应突出最重要的通道和空间特征。提出了基于多尺度局部特征图的全局信息自适应融合的DFF算法。我们将DLK和DFF合并到一个分层的ViT体系结构中，以利用它们的缩放行为，但是由于ViT体系结构中的特征嵌入约束，它们很难有效地提取低级特征。为了解决这一限制，我们提出了一个显著性层，在不嵌入特征的情况下，从图像的原始维度提取低级特征。这个突出层采用通道混频器来有效地捕获全局表示。我们进一步将突出层合并到分层ViT体系结构中，以开发一种称为D-Net的新型网络。D-Net有效地利用了多尺度的大接受野，并自适应地利用了全局上下文信息。大量的实验结果表明，与最先进的模型相比，该模型的分割性能优越，计算复杂度相对较低。该代码可在https://github.com/sotiraslab/DLK上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

D-Net: Dynamic large kernel with dynamic feature fusion for volumetric medical image segmentation

Hierarchical Vision Transformers (ViTs) have achieved significant success in medical image segmentation due to their large receptive field and ability to leverage long-range contextual information. Convolutional neural networks (CNNs) may also deliver a large receptive field by using large convolutional kernels. However, because they use fixed-sized kernels, CNNs with large kernels remain limited in their ability to adaptively capture multi-scale features from organs that vary greatly in shape and size. They are also unable to utilize global contextual information efficiently. To address these limitations, we propose lightweight Dynamic Large Kernel (DLK) and Dynamic Feature Fusion (DFF) modules. The DLK employs multiple large kernels with varying kernel sizes and dilation rates to capture multi-scale features. Subsequently, DLK utilizes a dynamic selection mechanism to adaptively highlight the most important channel and spatial features based on global information. The DFF is proposed to adaptively fuse multi-scale local feature maps based on their global information. We incorporated DLK and DFF into a hierarchical ViT architecture to leverage their scaling behavior, but they struggle to extract low-level features effectively due to feature embedding constraints in ViT architectures. To tackle this limitation, we propose a Salience layer to extract low-level features from images at their original dimensions without feature embedding. This Salience layer employs a Channel Mixer to capture global representations effectively. We further incorporated the Salience layer into the hierarchical ViT architecture to develop a novel network, termed D-Net. D-Net effectively utilizes a multi-scale large receptive field and adaptively harnesses global contextual information. Extensive experimental results demonstrate its superior segmentation performance compared to state-of-the-art models, with comparably lower computational complexity. The code is made available at https://github.com/sotiraslab/DLK.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical Signal Processing and Control 工程技术-工程：生物医学

CiteScore

9.80

自引率

13.70%

发文量

822

审稿时长

4 months

期刊介绍： Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management. Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.