CMS-net：边缘感知多模态MRI特征融合用于脑肿瘤分割

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-02-28 DOI:10.1016/j.imavis.2025.105481

Chunjie Lv , Biyuan Li , Xiuwei Wang , Pengfei Cai , Bo Yang , Xuefeng Jia , Jun Yan

{"title":"CMS-net：边缘感知多模态MRI特征融合用于脑肿瘤分割","authors":"Chunjie Lv , Biyuan Li , Xiuwei Wang , Pengfei Cai , Bo Yang , Xuefeng Jia , Jun Yan","doi":"10.1016/j.imavis.2025.105481","DOIUrl":null,"url":null,"abstract":"<div><div>With the growing application of artificial intelligence in medical image processing, multimodal MRI brain tumor segmentation has become crucial for clinical diagnosis and treatment. Accurate segmentation relies heavily on the effective utilization of multimodal information. However, most existing methods primarily focus on global and local deep semantic features, often overlooking critical aspects such as edge information and cross-channel correlations. To address these limitations while retaining the strengths of existing methods, we propose a novel brain tumor segmentation approach: an edge-aware feature fusion model based on a dual-encoder architecture. CMS-Net is a novel brain tumor segmentation model that integrates edge-aware fusion, cross-channel interaction, and spatial state feature extraction to fully leverage multimodal information for improved segmentation accuracy. The architecture comprises two main components: an encoder and a decoder. The encoder utilizes both convolutional downsampling and Smart Swin Transformer downsampling, with the latter employing Shifted Spatial Multi-Head Self-Attention (SSW-MSA) to capture global features and enhance long-range dependencies. The decoder reconstructs the image via the CMS-Block, which consists of three key modules: the Multi-Scale Deep Convolutional Cross-Channel Attention module (MDTA), the Spatial State Module (SSM), and the Boundary-Aware Feature Fusion module (SWA). CMS-Net's dual-encoder architecture allows for deep extraction of both local and global features, enhancing segmentation performance. MDTA generates attention maps through cross-channel covariance, while SSM models spatial context to improve the understanding of complex structures. The SWA module, combining SSW-MSA with pooling, subtraction, and convolution, facilitates feature fusion and edge extraction. Dice and Focal loss functions were introduced to optimize cross-channel and spatial feature extraction. Experimental results on the BraTS2018, BraTS2019, and BraTS2020 datasets demonstrate that CMS-Net effectively integrates spatial state, cross-channel, and boundary information, significantly improving multimodal brain tumor segmentation accuracy.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105481"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CMS-net: Edge-aware multimodal MRI feature fusion for brain tumor segmentation\",\"authors\":\"Chunjie Lv , Biyuan Li , Xiuwei Wang , Pengfei Cai , Bo Yang , Xuefeng Jia , Jun Yan\",\"doi\":\"10.1016/j.imavis.2025.105481\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the growing application of artificial intelligence in medical image processing, multimodal MRI brain tumor segmentation has become crucial for clinical diagnosis and treatment. Accurate segmentation relies heavily on the effective utilization of multimodal information. However, most existing methods primarily focus on global and local deep semantic features, often overlooking critical aspects such as edge information and cross-channel correlations. To address these limitations while retaining the strengths of existing methods, we propose a novel brain tumor segmentation approach: an edge-aware feature fusion model based on a dual-encoder architecture. CMS-Net is a novel brain tumor segmentation model that integrates edge-aware fusion, cross-channel interaction, and spatial state feature extraction to fully leverage multimodal information for improved segmentation accuracy. The architecture comprises two main components: an encoder and a decoder. The encoder utilizes both convolutional downsampling and Smart Swin Transformer downsampling, with the latter employing Shifted Spatial Multi-Head Self-Attention (SSW-MSA) to capture global features and enhance long-range dependencies. The decoder reconstructs the image via the CMS-Block, which consists of three key modules: the Multi-Scale Deep Convolutional Cross-Channel Attention module (MDTA), the Spatial State Module (SSM), and the Boundary-Aware Feature Fusion module (SWA). CMS-Net's dual-encoder architecture allows for deep extraction of both local and global features, enhancing segmentation performance. MDTA generates attention maps through cross-channel covariance, while SSM models spatial context to improve the understanding of complex structures. The SWA module, combining SSW-MSA with pooling, subtraction, and convolution, facilitates feature fusion and edge extraction. Dice and Focal loss functions were introduced to optimize cross-channel and spatial feature extraction. Experimental results on the BraTS2018, BraTS2019, and BraTS2020 datasets demonstrate that CMS-Net effectively integrates spatial state, cross-channel, and boundary information, significantly improving multimodal brain tumor segmentation accuracy.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"156 \",\"pages\":\"Article 105481\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625000691\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000691","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着人工智能在医学图像处理中的应用越来越广泛，多模态MRI脑肿瘤分割已成为临床诊断和治疗的关键。准确的分割很大程度上依赖于多模态信息的有效利用。然而，大多数现有方法主要关注全局和局部深度语义特征，往往忽略了边缘信息和跨通道相关性等关键方面。为了解决这些局限性，同时保留现有方法的优势，我们提出了一种新的脑肿瘤分割方法：基于双编码器架构的边缘感知特征融合模型。CMS-Net是一种新型的脑肿瘤分割模型，它集成了边缘感知融合、跨通道交互和空间状态特征提取，充分利用多模态信息来提高分割精度。该架构包括两个主要组件：编码器和解码器。编码器采用卷积下采样和智能Swin变压器下采样，后者采用移位空间多头自注意（SSW-MSA）来捕获全局特征并增强远程依赖性。解码器通过cms块重构图像，该块由三个关键模块组成：多尺度深度卷积跨通道注意模块（MDTA）、空间状态模块（SSM）和边界感知特征融合模块（SWA）。CMS-Net的双编码器架构允许深度提取局部和全局特征，增强分割性能。MDTA通过跨通道协方差生成注意图，而SSM通过空间背景建模来提高对复杂结构的理解。SWA模块将SSW-MSA与池化、减法和卷积相结合，实现了特征融合和边缘提取。引入Dice和Focal loss函数优化跨通道和空间特征提取。在BraTS2018、BraTS2019和BraTS2020数据集上的实验结果表明，CMS-Net有效地整合了空间状态、跨通道和边界信息，显著提高了多模态脑肿瘤的分割精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CMS-net: Edge-aware multimodal MRI feature fusion for brain tumor segmentation

With the growing application of artificial intelligence in medical image processing, multimodal MRI brain tumor segmentation has become crucial for clinical diagnosis and treatment. Accurate segmentation relies heavily on the effective utilization of multimodal information. However, most existing methods primarily focus on global and local deep semantic features, often overlooking critical aspects such as edge information and cross-channel correlations. To address these limitations while retaining the strengths of existing methods, we propose a novel brain tumor segmentation approach: an edge-aware feature fusion model based on a dual-encoder architecture. CMS-Net is a novel brain tumor segmentation model that integrates edge-aware fusion, cross-channel interaction, and spatial state feature extraction to fully leverage multimodal information for improved segmentation accuracy. The architecture comprises two main components: an encoder and a decoder. The encoder utilizes both convolutional downsampling and Smart Swin Transformer downsampling, with the latter employing Shifted Spatial Multi-Head Self-Attention (SSW-MSA) to capture global features and enhance long-range dependencies. The decoder reconstructs the image via the CMS-Block, which consists of three key modules: the Multi-Scale Deep Convolutional Cross-Channel Attention module (MDTA), the Spatial State Module (SSM), and the Boundary-Aware Feature Fusion module (SWA). CMS-Net's dual-encoder architecture allows for deep extraction of both local and global features, enhancing segmentation performance. MDTA generates attention maps through cross-channel covariance, while SSM models spatial context to improve the understanding of complex structures. The SWA module, combining SSW-MSA with pooling, subtraction, and convolution, facilitates feature fusion and edge extraction. Dice and Focal loss functions were introduced to optimize cross-channel and spatial feature extraction. Experimental results on the BraTS2018, BraTS2019, and BraTS2020 datasets demonstrate that CMS-Net effectively integrates spatial state, cross-channel, and boundary information, significantly improving multimodal brain tumor segmentation accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.