A 3D hierarchical cross-modality interaction network using transformers and convolutions for brain glioma segmentation in MR images

IF 3.2 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Medical physics Pub Date : 2024-08-13 DOI:10.1002/mp.17354
Yuzhou Zhuang, Hong Liu, Wei Fang, Guangzhi Ma, Sisi Sun, Yunfeng Zhu, Xu Zhang, Chuanbin Ge, Wenyang Chen, Jiaosong Long, Enmin Song
{"title":"A 3D hierarchical cross-modality interaction network using transformers and convolutions for brain glioma segmentation in MR images","authors":"Yuzhou Zhuang,&nbsp;Hong Liu,&nbsp;Wei Fang,&nbsp;Guangzhi Ma,&nbsp;Sisi Sun,&nbsp;Yunfeng Zhu,&nbsp;Xu Zhang,&nbsp;Chuanbin Ge,&nbsp;Wenyang Chen,&nbsp;Jiaosong Long,&nbsp;Enmin Song","doi":"10.1002/mp.17354","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Precise glioma segmentation from multi-parametric magnetic resonance (MR) images is essential for brain glioma diagnosis. However, due to the indistinct boundaries between tumor sub-regions and the heterogeneous appearances of gliomas in volumetric MR scans, designing a reliable and automated glioma segmentation method is still challenging. Although existing 3D Transformer-based or convolution-based segmentation networks have obtained promising results via multi-modal feature fusion strategies or contextual learning methods, they widely lack the capability of hierarchical interactions between different modalities and cannot effectively learn comprehensive feature representations related to all glioma sub-regions.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>To overcome these problems, in this paper, we propose a 3D hierarchical cross-modality interaction network (HCMINet) using Transformers and convolutions for accurate multi-modal glioma segmentation, which leverages an effective hierarchical cross-modality interaction strategy to sufficiently learn modality-specific and modality-shared knowledge correlated to glioma sub-region segmentation from multi-parametric MR images.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>In the HCMINet, we first design a hierarchical cross-modality interaction Transformer (HCMITrans) encoder to hierarchically encode and fuse heterogeneous multi-modal features by Transformer-based intra-modal embeddings and inter-modal interactions in multiple encoding stages, which effectively captures complex cross-modality correlations while modeling global contexts. Then, we collaborate an HCMITrans encoder with a modality-shared convolutional encoder to construct the dual-encoder architecture in the encoding stage, which can learn the abundant contextual information from global and local perspectives. Finally, in the decoding stage, we present a progressive hybrid context fusion (PHCF) decoder to progressively fuse local and global features extracted by the dual-encoder architecture, which utilizes the local-global context fusion (LGCF) module to efficiently alleviate the contextual discrepancy among the decoding features.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Extensive experiments are conducted on two public and competitive glioma benchmark datasets, including the BraTS2020 dataset with 494 patients and the BraTS2021 dataset with 1251 patients. Results show that our proposed method outperforms existing Transformer-based and CNN-based methods using other multi-modal fusion strategies in our experiments. Specifically, the proposed HCMINet achieves state-of-the-art mean DSC values of 85.33% and 91.09% on the BraTS2020 online validation dataset and the BraTS2021 local testing dataset, respectively.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Our proposed method can accurately and automatically segment glioma regions from multi-parametric MR images, which is beneficial for the quantitative analysis of brain gliomas and helpful for reducing the annotation burden of neuroradiologists.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"51 11","pages":"8371-8389"},"PeriodicalIF":3.2000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17354","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Precise glioma segmentation from multi-parametric magnetic resonance (MR) images is essential for brain glioma diagnosis. However, due to the indistinct boundaries between tumor sub-regions and the heterogeneous appearances of gliomas in volumetric MR scans, designing a reliable and automated glioma segmentation method is still challenging. Although existing 3D Transformer-based or convolution-based segmentation networks have obtained promising results via multi-modal feature fusion strategies or contextual learning methods, they widely lack the capability of hierarchical interactions between different modalities and cannot effectively learn comprehensive feature representations related to all glioma sub-regions.

Purpose

To overcome these problems, in this paper, we propose a 3D hierarchical cross-modality interaction network (HCMINet) using Transformers and convolutions for accurate multi-modal glioma segmentation, which leverages an effective hierarchical cross-modality interaction strategy to sufficiently learn modality-specific and modality-shared knowledge correlated to glioma sub-region segmentation from multi-parametric MR images.

Methods

In the HCMINet, we first design a hierarchical cross-modality interaction Transformer (HCMITrans) encoder to hierarchically encode and fuse heterogeneous multi-modal features by Transformer-based intra-modal embeddings and inter-modal interactions in multiple encoding stages, which effectively captures complex cross-modality correlations while modeling global contexts. Then, we collaborate an HCMITrans encoder with a modality-shared convolutional encoder to construct the dual-encoder architecture in the encoding stage, which can learn the abundant contextual information from global and local perspectives. Finally, in the decoding stage, we present a progressive hybrid context fusion (PHCF) decoder to progressively fuse local and global features extracted by the dual-encoder architecture, which utilizes the local-global context fusion (LGCF) module to efficiently alleviate the contextual discrepancy among the decoding features.

Results

Extensive experiments are conducted on two public and competitive glioma benchmark datasets, including the BraTS2020 dataset with 494 patients and the BraTS2021 dataset with 1251 patients. Results show that our proposed method outperforms existing Transformer-based and CNN-based methods using other multi-modal fusion strategies in our experiments. Specifically, the proposed HCMINet achieves state-of-the-art mean DSC values of 85.33% and 91.09% on the BraTS2020 online validation dataset and the BraTS2021 local testing dataset, respectively.

Conclusions

Our proposed method can accurately and automatically segment glioma regions from multi-parametric MR images, which is beneficial for the quantitative analysis of brain gliomas and helpful for reducing the annotation burden of neuroradiologists.

使用变换器和卷积的三维分层跨模态交互网络,用于磁共振图像中的脑胶质瘤分割。
背景:从多参数磁共振(MR)图像中精确分割胶质瘤对于脑胶质瘤诊断至关重要。然而,由于肿瘤亚区域之间的边界不清晰,以及胶质瘤在容积磁共振扫描中的异质性表现,设计一种可靠的自动胶质瘤分割方法仍具有挑战性。虽然现有的基于三维变换器或卷积的分割网络通过多模态特征融合策略或上下文学习方法取得了可喜的成果,但它们普遍缺乏不同模态之间的分层交互能力,无法有效学习与所有胶质瘤亚区域相关的综合特征表征。目的:为了克服这些问题,本文提出了一种利用变换器和卷积进行多模态胶质瘤精确分割的三维分层跨模态交互网络(HCMINet),该网络利用有效的分层跨模态交互策略,从多参数磁共振图像中充分学习与胶质瘤亚区分割相关的模态特异性和模态共享知识:在 HCMINet 中,我们首先设计了分层跨模态交互变换器(HCMITrans)编码器,通过基于变换器的模态内嵌入和模态间交互,在多个编码阶段对异构多模态特征进行分层编码和融合,从而在建模全局上下文的同时有效捕捉复杂的跨模态相关性。然后,我们将 HCMITrans 编码器与模态共享卷积编码器合作,在编码阶段构建了双编码器架构,可以从全局和局部角度学习丰富的上下文信息。最后,在解码阶段,我们提出了渐进式混合上下文融合(PHCF)解码器,以逐步融合双编码器架构提取的本地和全局特征,该解码器利用本地-全局上下文融合(LGCF)模块来有效缓解解码特征之间的上下文差异:我们在两个公开且具有竞争力的胶质瘤基准数据集上进行了广泛的实验,其中包括包含494名患者的BraTS2020数据集和包含1251名患者的BraTS2021数据集。实验结果表明,我们提出的方法优于使用其他多模态融合策略的基于 Transformer 和基于 CNN 的现有方法。具体来说,在 BraTS2020 在线验证数据集和 BraTS2021 本地测试数据集上,所提出的 HCMINet 实现了最先进的平均 DSC 值,分别为 85.33% 和 91.09%:我们提出的方法可以从多参数磁共振图像中准确、自动地分割胶质瘤区域,有利于脑胶质瘤的定量分析,也有助于减轻神经放射医师的注释负担。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Medical physics
Medical physics 医学-核医学
CiteScore
6.80
自引率
15.80%
发文量
660
审稿时长
1.7 months
期刊介绍: Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信