Bin Yu, Quan Zhou, Li Yuan, Huageng Liang, Pavel Shcherbakov, Xuming Zhang
{"title":"基于交叉窗口自关注的串并联卷积神经网络和变压器的三维医学图像分割","authors":"Bin Yu, Quan Zhou, Li Yuan, Huageng Liang, Pavel Shcherbakov, Xuming Zhang","doi":"10.1049/cit2.12411","DOIUrl":null,"url":null,"abstract":"<p>Convolutional neural network (CNN) with the encoder–decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature. The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy. In this work, a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels. The core components of the proposed method include the cross window self-attention based transformer (CWST) and multi-scale local enhanced (MLE) modules. The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows. The MLE module selectively fuses features by computing the voxel attention between different branch features, and uses convolution to strengthen the dense local information. The experiments on the prostate, atrium, and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient, Intersection over Union, 95% Hausdorff distance and average symmetric surface distance.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"337-348"},"PeriodicalIF":8.4000,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12411","citationCount":"0","resultStr":"{\"title\":\"3D medical image segmentation using the serial–parallel convolutional neural network and transformer based on cross-window self-attention\",\"authors\":\"Bin Yu, Quan Zhou, Li Yuan, Huageng Liang, Pavel Shcherbakov, Xuming Zhang\",\"doi\":\"10.1049/cit2.12411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Convolutional neural network (CNN) with the encoder–decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature. The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy. In this work, a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels. The core components of the proposed method include the cross window self-attention based transformer (CWST) and multi-scale local enhanced (MLE) modules. The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows. The MLE module selectively fuses features by computing the voxel attention between different branch features, and uses convolution to strengthen the dense local information. The experiments on the prostate, atrium, and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient, Intersection over Union, 95% Hausdorff distance and average symmetric surface distance.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"10 2\",\"pages\":\"337-348\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2025-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12411\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12411\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12411","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
3D medical image segmentation using the serial–parallel convolutional neural network and transformer based on cross-window self-attention
Convolutional neural network (CNN) with the encoder–decoder structure is popular in medical image segmentation due to its excellent local feature extraction ability but it faces limitations in capturing the global feature. The transformer can extract the global information well but adapting it to small medical datasets is challenging and its computational complexity can be heavy. In this work, a serial and parallel network is proposed for the accurate 3D medical image segmentation by combining CNN and transformer and promoting feature interactions across various semantic levels. The core components of the proposed method include the cross window self-attention based transformer (CWST) and multi-scale local enhanced (MLE) modules. The CWST module enhances the global context understanding by partitioning 3D images into non-overlapping windows and calculating sparse global attention between windows. The MLE module selectively fuses features by computing the voxel attention between different branch features, and uses convolution to strengthen the dense local information. The experiments on the prostate, atrium, and pancreas MR/CT image datasets consistently demonstrate the advantage of the proposed method over six popular segmentation models in both qualitative evaluation and quantitative indexes such as dice similarity coefficient, Intersection over Union, 95% Hausdorff distance and average symmetric surface distance.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.