Bridging multi-level gaps: Bidirectional reciprocal cycle framework for text-guided label-efficient segmentation in echocardiography

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-03-07 DOI:10.1016/j.media.2025.103536

Zhenxuan Zhang , Heye Zhang , Tieyong Zeng , Guang Yang , Zhenquan Shi , Zhifan Gao

{"title":"Bridging multi-level gaps: Bidirectional reciprocal cycle framework for text-guided label-efficient segmentation in echocardiography","authors":"Zhenxuan Zhang , Heye Zhang , Tieyong Zeng , Guang Yang , Zhenquan Shi , Zhifan Gao","doi":"10.1016/j.media.2025.103536","DOIUrl":null,"url":null,"abstract":"<div><div>Text-guided visual understanding is a potential solution for downstream task learning in echocardiography. It can reduce reliance on labeled large datasets and facilitate learning clinical tasks. This is because the text can embed highly condensed clinical information into predictions for visual tasks. The contrastive language-image pretraining (CLIP) based methods extract image-text features by constructing a contrastive learning pre-train process in a sequence of matched text and images. These methods adapt the pre-trained network parameters to improve downstream task performance with text guidance. However, these methods still have the challenge of the multi-level gap between image and text. It mainly stems from spatial-level, contextual-level, and domain-level gaps. It is difficult to deal with medical image–text pairs and dense prediction tasks. Therefore, we propose a bidirectional reciprocal cycle (BRC) framework to bridge the multi-level gaps. First, the BRC constructs pyramid reciprocal alignments of embedded global and local image–text feature representations. This matches complex medical expertise with corresponding phenomena. Second, BRC enforces the forward inference to be consistent with the reverse mapping (i.e., the text <span><math><mo>→</mo></math></span> feature is consistent with the feature <span><math><mo>→</mo></math></span> text or feature <span><math><mo>→</mo></math></span> image). This enforces the perception of the contextual relationship between input data and feature. Third, the BRC can adapt to the specific downstream segmentation task. This embeds complex text information to directly guide downstream tasks with a cross-modal attention mechanism. Compared with 22 existing methods, our BRC can achieve state-of-the-art performance on segmentation tasks (DSC = 95.2%). Extensive experiments on 11048 patients show that our method can significantly improve the accuracy and reduce the reliance on labeled data (DSC increased from 81.5% to 86.6% with text assistance in 1% labeled proportion data).</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"102 ","pages":"Article 103536"},"PeriodicalIF":10.7000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525000830","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Text-guided visual understanding is a potential solution for downstream task learning in echocardiography. It can reduce reliance on labeled large datasets and facilitate learning clinical tasks. This is because the text can embed highly condensed clinical information into predictions for visual tasks. The contrastive language-image pretraining (CLIP) based methods extract image-text features by constructing a contrastive learning pre-train process in a sequence of matched text and images. These methods adapt the pre-trained network parameters to improve downstream task performance with text guidance. However, these methods still have the challenge of the multi-level gap between image and text. It mainly stems from spatial-level, contextual-level, and domain-level gaps. It is difficult to deal with medical image–text pairs and dense prediction tasks. Therefore, we propose a bidirectional reciprocal cycle (BRC) framework to bridge the multi-level gaps. First, the BRC constructs pyramid reciprocal alignments of embedded global and local image–text feature representations. This matches complex medical expertise with corresponding phenomena. Second, BRC enforces the forward inference to be consistent with the reverse mapping (i.e., the text

\to

feature is consistent with the feature

\to

text or feature

\to

image). This enforces the perception of the contextual relationship between input data and feature. Third, the BRC can adapt to the specific downstream segmentation task. This embeds complex text information to directly guide downstream tasks with a cross-modal attention mechanism. Compared with 22 existing methods, our BRC can achieve state-of-the-art performance on segmentation tasks (DSC = 95.2%). Extensive experiments on 11048 patients show that our method can significantly improve the accuracy and reduce the reliance on labeled data (DSC increased from 81.5% to 86.6% with text assistance in 1% labeled proportion data).

查看原文本刊更多论文

弥合多层次的差距：双向互惠循环框架文本引导标签高效分割超声心动图

文本引导的视觉理解是超声心动图下游任务学习的潜在解决方案。它可以减少对标记大数据集的依赖，并促进临床任务的学习。这是因为文本可以将高度浓缩的临床信息嵌入到视觉任务的预测中。基于对比语言-图像预训练（CLIP）的方法通过在一系列匹配的文本和图像中构建对比学习预训练过程来提取图像-文本特征。这些方法采用预先训练好的网络参数，在文本引导下提高下游任务性能。然而，这些方法仍然面临着图像和文本之间多层次差距的挑战。它主要源于空间级、上下文级和领域级的差距。医学图像-文本对和密集预测任务难以处理。因此，我们提出了一个双向互反循环（BRC）框架来弥合多层次的差距。首先，BRC构建嵌入的全局和局部图像-文本特征表示的金字塔对等对齐。这将复杂的医学专业知识与相应的现象相匹配。其次，BRC强制前向推理与反向映射一致（即文本→特征与特征→文本或特征→图像一致）。这加强了对输入数据和特性之间上下文关系的感知。第三，BRC能够适应特定的下游分割任务。这嵌入了复杂的文本信息，直接指导下游任务与跨模态注意机制。与现有的22种方法相比，我们的BRC在分割任务上达到了最先进的性能（DSC = 95.2%）。对11048例患者的大量实验表明，我们的方法可以显著提高准确率，减少对标记数据的依赖（在1%标记比例的数据中，文本辅助下的DSC从81.5%提高到86.6%）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.