基于交叉窗口自关注的串并联卷积神经网络和变压器的三维医学图像分割

IF 7.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology Pub Date : 2025-01-25 DOI:10.1049/cit2.12411

Bin Yu, Quan Zhou, Li Yuan, Huageng Liang, Pavel Shcherbakov, Xuming Zhang

{"title":"基于交叉窗口自关注的串并联卷积神经网络和变压器的三维医学图像分割","authors":"Bin Yu, Quan Zhou, Li Yuan, Huageng Liang, Pavel Shcherbakov, Xuming Zhang","doi":"10.1049/cit2.12411","DOIUrl":null,"url":null,"abstract":"<p>Convolutional neural network (CNN) with the encoder–decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature. The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy. In this work, a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels. The core components of the proposed method include the cross window self-attention based transformer (CWST) and multi-scale local enhanced (MLE) modules. The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows. The MLE module selectively fuses features by computing the voxel attention between different branch features, and uses convolution to strengthen the dense local information. The experiments on the prostate, atrium, and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient, Intersection over Union, 95% Hausdorff distance and average symmetric surface distance.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"337-348"},"PeriodicalIF":7.3000,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12411","citationCount":"0","resultStr":"{\"title\":\"3D medical image segmentation using the serial–parallel convolutional neural network and transformer based on cross-window self-attention\",\"authors\":\"Bin Yu, Quan Zhou, Li Yuan, Huageng Liang, Pavel Shcherbakov, Xuming Zhang\",\"doi\":\"10.1049/cit2.12411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Convolutional neural network (CNN) with the encoder–decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature. The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy. In this work, a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels. The core components of the proposed method include the cross window self-attention based transformer (CWST) and multi-scale local enhanced (MLE) modules. The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows. The MLE module selectively fuses features by computing the voxel attention between different branch features, and uses convolution to strengthen the dense local information. The experiments on the prostate, atrium, and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient, Intersection over Union, 95% Hausdorff distance and average symmetric surface distance.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"10 2\",\"pages\":\"337-348\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12411\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.12411\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.12411","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于编码器-解码器结构的卷积神经网络（Convolutional neural network， CNN）以其出色的局部特征提取能力在医学图像分割中得到广泛应用，但在捕获全局特征方面存在局限性。变压器可以很好地提取全局信息，但将其适应于小型医疗数据集是一项挑战，其计算复杂度可能很高。本文将CNN与transformer相结合，促进不同语义层次的特征交互，提出了一种用于医学三维图像精确分割的串并联网络。该方法的核心组件包括基于交叉窗口自关注的变压器（CWST）和多尺度局部增强（MLE）模块。CWST模块通过将3D图像划分为不重叠的窗口并计算窗口之间的稀疏全局关注来增强全局上下文理解。MLE模块通过计算不同分支特征之间的体素关注来选择性地融合特征，并利用卷积增强密集的局部信息。在前列腺、心房和胰腺的MR/CT图像数据集上的实验一致表明，该方法在骰子相似系数、相交/联合、95% Hausdorff距离和平均对称面距离等定性评价和定量指标上都优于六种常用的分割模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

3D medical image segmentation using the serial–parallel convolutional neural network and transformer based on cross-window self-attention

查看原文本刊更多论文

3D medical image segmentation using the serial–parallel convolutional neural network and transformer based on cross-window self-attention

Convolutional neural network (CNN) with the encoder–decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature. The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy. In this work, a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels. The core components of the proposed method include the cross window self-attention based transformer (CWST) and multi-scale local enhanced (MLE) modules. The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows. The MLE module selectively fuses features by computing the voxel attention between different branch features, and uses convolution to strengthen the dense local information. The experiments on the prostate, atrium, and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient, Intersection over Union, 95% Hausdorff distance and average symmetric surface distance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

11.00

自引率

3.90%

发文量

134

审稿时长

35 weeks

期刊介绍： CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.