Exploring a Novel Conv-Transformer Network for Multi-Modality Heart Segmentation

iRadiology Pub Date : 2026-03-03 Epub Date: 2025-10-16 DOI:10.1002/ird3.70028

Youyou Ding, Hao Dang, Jiayi Luo, Xiaoyu Zhuo, Ningyu Huang, Junsheng Xiao, Zongwang Lv

{"title":"Exploring a Novel Conv-Transformer Network for Multi-Modality Heart Segmentation","authors":"Youyou Ding, Hao Dang, Jiayi Luo, Xiaoyu Zhuo, Ningyu Huang, Junsheng Xiao, Zongwang Lv","doi":"10.1002/ird3.70028","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>In recent years, deep convolutional neural networks (CNNs) have achieved great successes in medical imaging. However, it is difficult to obtain accurate pathological information for clinical diagnosis and treatment by leveraging single-modality medical images. This study aims to provide an efficient multimodality whole heart segmentation method for the diagnosis of coronary heart disease.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We propose SFAM-TransUnet for multimodality whole heart segmentation, a novel deep learning framework combining CNNs and transformers. Primarily, the method integrates CNNs and visual transformers (Vits) into a unified fusion framework. Specifically, the shallow feature fusion module is designed to connect MRI and CT images, thereby providing a powerful and efficient multimodality fusion backbone for semantic segmentation. Furthermore, we propose a fusion ViT (FViT) module including self-attention (SA) and adaptive mutual boost attention (Ada-MBA) to enhance contextual information within and across modalities. The Ada-MBA module assigns attention to semantic perception regions by calculating SA and cross-attention, which improves the ability to understand context from the different modalities. Extensive experiments are conducted on the clinical Multi-Modality Whole Heart Segmentation datasets.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>We successfully improved the whole heart segmentation DSCs to 0.902 (AA), 0.920 (LV-blood), 0.863 (LA-blood), and 0.837 (LV-myo), the HDs to 9.886 (AA), 9.947 (LV-blood), 11.911 (LA-blood), and 13.599 (LV-myo), the PSNR values to 33.577 (AA), 30.091 (LV-blood), 32.055 (LA-blood), and 29.837 (LV-myo), SSMI values to 0.901 (AA), 0.818 (LV-blood), 0.765 (LA-blood), and 0.743 (LV-myo). This demonstrate SFAM-TransUnet outperforms various alternative methods.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>We propose SFAM-TransUnet, an efficient framework tailored for whole heart segmentation that combines CNNs and transformers. It provides a powerful multimodality fusion network to improve the performance of whole heart semantic segmentation. These results demonstrate the efficacy of SFAM-TransUnet in integrating relevant information between different modalities in multimodal tasks.</p>\n </section>\n </div>","PeriodicalId":73508,"journal":{"name":"iRadiology","volume":"4 1","pages":"13-22"},"PeriodicalIF":0.0000,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ird3.70028","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"iRadiology","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ird3.70028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/16 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background

In recent years, deep convolutional neural networks (CNNs) have achieved great successes in medical imaging. However, it is difficult to obtain accurate pathological information for clinical diagnosis and treatment by leveraging single-modality medical images. This study aims to provide an efficient multimodality whole heart segmentation method for the diagnosis of coronary heart disease.

Methods

We propose SFAM-TransUnet for multimodality whole heart segmentation, a novel deep learning framework combining CNNs and transformers. Primarily, the method integrates CNNs and visual transformers (Vits) into a unified fusion framework. Specifically, the shallow feature fusion module is designed to connect MRI and CT images, thereby providing a powerful and efficient multimodality fusion backbone for semantic segmentation. Furthermore, we propose a fusion ViT (FViT) module including self-attention (SA) and adaptive mutual boost attention (Ada-MBA) to enhance contextual information within and across modalities. The Ada-MBA module assigns attention to semantic perception regions by calculating SA and cross-attention, which improves the ability to understand context from the different modalities. Extensive experiments are conducted on the clinical Multi-Modality Whole Heart Segmentation datasets.

Results

We successfully improved the whole heart segmentation DSCs to 0.902 (AA), 0.920 (LV-blood), 0.863 (LA-blood), and 0.837 (LV-myo), the HDs to 9.886 (AA), 9.947 (LV-blood), 11.911 (LA-blood), and 13.599 (LV-myo), the PSNR values to 33.577 (AA), 30.091 (LV-blood), 32.055 (LA-blood), and 29.837 (LV-myo), SSMI values to 0.901 (AA), 0.818 (LV-blood), 0.765 (LA-blood), and 0.743 (LV-myo). This demonstrate SFAM-TransUnet outperforms various alternative methods.

Conclusions

We propose SFAM-TransUnet, an efficient framework tailored for whole heart segmentation that combines CNNs and transformers. It provides a powerful multimodality fusion network to improve the performance of whole heart semantic segmentation. These results demonstrate the efficacy of SFAM-TransUnet in integrating relevant information between different modalities in multimodal tasks.

Abstract Image

查看原文本刊更多论文

一种用于多模态心脏分割的新型逆变变压器网络

近年来，深度卷积神经网络（cnn）在医学成像领域取得了巨大的成功。然而，利用单模态医学图像难以获得准确的病理信息，用于临床诊断和治疗。本研究旨在为冠心病的诊断提供一种高效的多模态全心分割方法。我们提出了SFAM-TransUnet用于多模态全心分割，这是一种结合cnn和变压器的新型深度学习框架。首先，该方法将cnn和视觉变换（Vits）融合到一个统一的融合框架中。其中，浅层特征融合模块用于连接MRI和CT图像，为语义分割提供了强大高效的多模态融合主干。此外，我们提出了一个融合ViT （FViT）模块，包括自我注意（SA）和自适应相互促进注意（Ada-MBA），以增强模态内部和跨模态的上下文信息。Ada-MBA模块通过计算SA和交叉注意将注意力分配到语义感知区域，从而提高了从不同模态理解上下文的能力。在临床多模态全心分割数据集上进行了大量的实验。结果全心分割dsc分别为0.902 （AA）、0.920 （LV-blood）、0.863 （LA-blood）、0.837 (LV-myo)， HDs分别为9.886 （AA）、9.947 （LV-blood）、11.911 （LA-blood）、13.599 (LV-myo)， PSNR分别为33.577 （AA）、30.091 （LV-blood）、32.055 （LA-blood）、29.837 (LV-myo)， SSMI分别为0.901 （AA）、0.818 （LV-blood）、0.765 （LA-blood）、0.743 （LV-myo）。这表明SFAM-TransUnet优于各种替代方法。我们提出了SFAM-TransUnet，这是一种结合cnn和变压器的高效全心分割框架。它提供了一个强大的多模态融合网络，以提高全心语义分割的性能。这些结果证明了SFAM-TransUnet在多模态任务中整合不同模态之间相关信息的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

iRadiology

自引率

0.00%

发文量