{"title":"Effective Global Context Integration for Lightweight 3D Medical Image Segmentation","authors":"Qiang Qiao;Meixia Qu;Wenyu Wang;Bin Jiang;Qiang Guo","doi":"10.1109/TCSVT.2024.3511926","DOIUrl":null,"url":null,"abstract":"Accurate and fast segmentation of 3D medical images is crucial in clinical analysis. CNNs struggle to capture long-range dependencies because of their inductive biases, whereas the Transformer can capture global features but faces a considerable computational burden. Thus, efficiently integrating global and detailed insights is key for precise segmentation. In this paper, we propose an effective and lightweight architecture named GCI-Net to address this issue. The key characteristic of GCI-Net is the global-guided feature enhancement strategy (GFES), which integrates the global context and facilitates the learning of local information; 3D convolutional attention, which captures long-range dependencies; and a progressive downsampling module, which perceives detailed information better. The GFES can capture the local range of information through global-guided feature fusion and global-local contrastive loss. All these designs collectively contribute to lower computational complexity and reliable performance improvements. The proposed model is trained and tested on four public datasets, namely MSD Brain Tumor, ACDC, BraTS2021, and MSD Lung. The experimental results show that, compared with several recent SOTA methods, our GCI-Net achieves superior computational efficiency with comparable or even better segmentation performance. The code is available at <uri>https://github.com/qintianjian-lab/GCI-Net</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4661-4674"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10778607/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate and fast segmentation of 3D medical images is crucial in clinical analysis. CNNs struggle to capture long-range dependencies because of their inductive biases, whereas the Transformer can capture global features but faces a considerable computational burden. Thus, efficiently integrating global and detailed insights is key for precise segmentation. In this paper, we propose an effective and lightweight architecture named GCI-Net to address this issue. The key characteristic of GCI-Net is the global-guided feature enhancement strategy (GFES), which integrates the global context and facilitates the learning of local information; 3D convolutional attention, which captures long-range dependencies; and a progressive downsampling module, which perceives detailed information better. The GFES can capture the local range of information through global-guided feature fusion and global-local contrastive loss. All these designs collectively contribute to lower computational complexity and reliable performance improvements. The proposed model is trained and tested on four public datasets, namely MSD Brain Tumor, ACDC, BraTS2021, and MSD Lung. The experimental results show that, compared with several recent SOTA methods, our GCI-Net achieves superior computational efficiency with comparable or even better segmentation performance. The code is available at https://github.com/qintianjian-lab/GCI-Net.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.