CNN-Transformer Rectified Collaborative Learning for Medical Image Segmentation

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-27 DOI:10.1109/TCSVT.2024.3523316

Lanhu Wu;Miao Zhang;Yongri Piao;Zhenyan Yao;Weibing Sun;Feng Tian;Huchuan Lu

{"title":"CNN-Transformer Rectified Collaborative Learning for Medical Image Segmentation","authors":"Lanhu Wu;Miao Zhang;Yongri Piao;Zhenyan Yao;Weibing Sun;Feng Tian;Huchuan Lu","doi":"10.1109/TCSVT.2024.3523316","DOIUrl":null,"url":null,"abstract":"Automatic and precise medical image segmentation (MIS) is of vital importance for clinical diagnosis and analysis. Current MIS methods mainly rely on the convolutional neural network (CNN) or self-attention mechanism (Transformer) for feature modeling. However, CNN-based methods suffer from the inaccurate localization owing to the limited global dependency while Transformer-based methods always present the coarse boundary for the lack of local emphasis. Although some CNN-Transformer hybrid methods are designed to synthesize the complementary local and global information for better performance, the combination of CNN and Transformer introduces numerous parameters and increases the computation cost. To this end, this paper proposes a CNN-Transformer rectified collaborative learning (CTRCL) framework to learn stronger CNN-based and Transformer-based models for MIS tasks via the bi-directional knowledge transfer between them. Specifically, we propose a rectified logit-wise collaborative learning (RLCL) strategy which introduces the ground truth to adaptively select and rectify the wrong regions in student soft labels for accurate knowledge transfer in the logit space. We also propose a class-aware feature-wise collaborative learning (CFCL) strategy to achieve effective knowledge transfer between CNN-based and Transformer-based models in the feature space by granting their intermediate features the similar capability of category perception. Extensive experiments on three popular MIS benchmarks demonstrate that our CTRCL outperforms most state-of-the-art collaborative learning methods under different evaluation metrics. The source code will be publicly available at <uri>https://github.com/LanhooNg/CTRCL</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4072-4086"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10816601/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic and precise medical image segmentation (MIS) is of vital importance for clinical diagnosis and analysis. Current MIS methods mainly rely on the convolutional neural network (CNN) or self-attention mechanism (Transformer) for feature modeling. However, CNN-based methods suffer from the inaccurate localization owing to the limited global dependency while Transformer-based methods always present the coarse boundary for the lack of local emphasis. Although some CNN-Transformer hybrid methods are designed to synthesize the complementary local and global information for better performance, the combination of CNN and Transformer introduces numerous parameters and increases the computation cost. To this end, this paper proposes a CNN-Transformer rectified collaborative learning (CTRCL) framework to learn stronger CNN-based and Transformer-based models for MIS tasks via the bi-directional knowledge transfer between them. Specifically, we propose a rectified logit-wise collaborative learning (RLCL) strategy which introduces the ground truth to adaptively select and rectify the wrong regions in student soft labels for accurate knowledge transfer in the logit space. We also propose a class-aware feature-wise collaborative learning (CFCL) strategy to achieve effective knowledge transfer between CNN-based and Transformer-based models in the feature space by granting their intermediate features the similar capability of category perception. Extensive experiments on three popular MIS benchmarks demonstrate that our CTRCL outperforms most state-of-the-art collaborative learning methods under different evaluation metrics. The source code will be publicly available at https://github.com/LanhooNg/CTRCL.

查看原文本刊更多论文

CNN-Transformer校正协同学习用于医学图像分割

医学图像的自动、精确分割对临床诊断和分析具有重要意义。目前的MIS方法主要依靠卷积神经网络（CNN）或自注意机制（Transformer）进行特征建模。然而，基于cnn的方法由于全局依赖有限而定位不准确，而基于transformer的方法由于缺乏局部重点而总是呈现粗糙的边界。虽然有些CNN-Transformer混合方法旨在综合互补的局部和全局信息以获得更好的性能，但CNN和Transformer的组合引入了大量的参数，增加了计算成本。为此，本文提出了CNN-Transformer整流协同学习（CTRCL）框架，通过两者之间的双向知识转移，学习更强的基于cnn和基于transformer的MIS任务模型。具体来说，我们提出了一种校正逻辑智能协作学习（RLCL）策略，该策略引入了基础真理来自适应地选择和纠正学生软标签中的错误区域，以便在逻辑空间中进行准确的知识转移。我们还提出了一种类感知特征智能协作学习（CFCL）策略，通过赋予基于cnn和transformer的中间特征相似的类别感知能力，在特征空间中实现基于cnn和transformer的模型之间的有效知识转移。在三个流行的MIS基准上进行的广泛实验表明，在不同的评估指标下，我们的CTRCL优于最先进的协作学习方法。源代码将在https://github.com/LanhooNg/CTRCL上公开提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.