文本知识驱动：基于文本视图的肺部感染区域分割知识转移网络

IF 11.8 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-05-12 DOI:10.1016/j.media.2025.103625

Lexin Fang , Xuemei Li , Yunyang Xu , Fan Zhang , Caiming Zhang

{"title":"文本知识驱动：基于文本视图的肺部感染区域分割知识转移网络","authors":"Lexin Fang , Xuemei Li , Yunyang Xu , Fan Zhang , Caiming Zhang","doi":"10.1016/j.media.2025.103625","DOIUrl":null,"url":null,"abstract":"<div><div>Lung infections are the leading cause of death among infectious diseases, and accurate segmentation of the infected lung area is crucial for effective treatment. Currently, segmentation methods that rely solely on imaging data have limited accuracy. Incorporating text information enriched with expert knowledge into the segmentation process has emerged as a novel approach. However, previous methods often used unified text encoding strategies for extracting textual features. It failed to adequately emphasize critical details in the text, particularly the spatial location of infected regions. Moreover, the semantic space inconsistency between text and image features complicates cross-modal information transfer. To close these gaps, we propose a <strong>Text-View Enhanced Knowledge Transfer Network (TVE-Net)</strong> that leverages key information from textual data to assist in segmentation and enhance the model’s perception of lung infection locations. The method generates a text view by probabilistically modeling the location information of infected areas in text using a robust, carefully designed positional probability function. By assigning lesion probabilities to each image region, the infected areas’ spatial information from the text view is explicitly integrated into the segmentation model. Once the text view has been introduced, a unified image encoder can be employed to extract text view features, so that both text and images are mapped into the same space. In addition, a self-supervised constraint based on text-view overlap and feature consistency is proposed to enhance the model’s robustness and semi-supervised capability through feature augmentation. Meanwhile, the newly designed multi-stage knowledge transfer module utilizes a globally enhanced cross-attention mechanism to comprehensively learn the implicit correlations between image features and text-view features, enabling effective knowledge transfer from text-view features to image features. Extensive experiments demonstrate that TVE-Net outperforms both unimodal and multimodal methods in both fully supervised and semi-supervised lung infection segmentation tasks, achieving significant improvements on QaTa-COV19 and MosMedData+ datasets.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103625"},"PeriodicalIF":11.8000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Driven by textual knowledge: A Text-View Enhanced Knowledge Transfer Network for lung infection region segmentation\",\"authors\":\"Lexin Fang , Xuemei Li , Yunyang Xu , Fan Zhang , Caiming Zhang\",\"doi\":\"10.1016/j.media.2025.103625\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Lung infections are the leading cause of death among infectious diseases, and accurate segmentation of the infected lung area is crucial for effective treatment. Currently, segmentation methods that rely solely on imaging data have limited accuracy. Incorporating text information enriched with expert knowledge into the segmentation process has emerged as a novel approach. However, previous methods often used unified text encoding strategies for extracting textual features. It failed to adequately emphasize critical details in the text, particularly the spatial location of infected regions. Moreover, the semantic space inconsistency between text and image features complicates cross-modal information transfer. To close these gaps, we propose a <strong>Text-View Enhanced Knowledge Transfer Network (TVE-Net)</strong> that leverages key information from textual data to assist in segmentation and enhance the model’s perception of lung infection locations. The method generates a text view by probabilistically modeling the location information of infected areas in text using a robust, carefully designed positional probability function. By assigning lesion probabilities to each image region, the infected areas’ spatial information from the text view is explicitly integrated into the segmentation model. Once the text view has been introduced, a unified image encoder can be employed to extract text view features, so that both text and images are mapped into the same space. In addition, a self-supervised constraint based on text-view overlap and feature consistency is proposed to enhance the model’s robustness and semi-supervised capability through feature augmentation. Meanwhile, the newly designed multi-stage knowledge transfer module utilizes a globally enhanced cross-attention mechanism to comprehensively learn the implicit correlations between image features and text-view features, enabling effective knowledge transfer from text-view features to image features. Extensive experiments demonstrate that TVE-Net outperforms both unimodal and multimodal methods in both fully supervised and semi-supervised lung infection segmentation tasks, achieving significant improvements on QaTa-COV19 and MosMedData+ datasets.</div></div>\",\"PeriodicalId\":18328,\"journal\":{\"name\":\"Medical image analysis\",\"volume\":\"103 \",\"pages\":\"Article 103625\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical image analysis\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1361841525001720\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525001720","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

肺部感染是感染性疾病中导致死亡的主要原因，准确的肺感染区域分割对于有效的治疗至关重要。目前，仅依靠成像数据的分割方法精度有限。将专业知识丰富的文本信息整合到分词过程中已成为一种新的分词方法。然而，以往的方法通常采用统一的文本编码策略来提取文本特征。它未能充分强调案文中的关键细节，特别是受感染地区的空间位置。此外，文本和图像特征之间的语义空间不一致使跨模态信息传递变得复杂。为了缩小这些差距，我们提出了一个文本视图增强知识转移网络（TVE-Net），该网络利用文本数据中的关键信息来辅助分割并增强模型对肺部感染位置的感知。该方法通过使用鲁棒的、精心设计的位置概率函数对文本中感染区域的位置信息进行概率建模，从而生成文本视图。通过为每个图像区域分配病变概率，将文本视图中受感染区域的空间信息明确地集成到分割模型中。引入文本视图后，可以使用统一的图像编码器提取文本视图特征，从而将文本和图像映射到同一空间。此外，提出了一种基于文本-视图重叠和特征一致性的自监督约束，通过特征增强增强模型的鲁棒性和半监督能力。同时，新设计的多阶段知识转移模块利用全局增强的交叉注意机制，全面学习图像特征与文本视图特征之间的隐式关联，实现从文本视图特征到图像特征的有效知识转移。大量实验表明，TVE-Net在全监督和半监督肺部感染分割任务中都优于单峰和多峰方法，在QaTa-COV19和MosMedData+数据集上取得了显着改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Driven by textual knowledge: A Text-View Enhanced Knowledge Transfer Network for lung infection region segmentation

Lung infections are the leading cause of death among infectious diseases, and accurate segmentation of the infected lung area is crucial for effective treatment. Currently, segmentation methods that rely solely on imaging data have limited accuracy. Incorporating text information enriched with expert knowledge into the segmentation process has emerged as a novel approach. However, previous methods often used unified text encoding strategies for extracting textual features. It failed to adequately emphasize critical details in the text, particularly the spatial location of infected regions. Moreover, the semantic space inconsistency between text and image features complicates cross-modal information transfer. To close these gaps, we propose a Text-View Enhanced Knowledge Transfer Network (TVE-Net) that leverages key information from textual data to assist in segmentation and enhance the model’s perception of lung infection locations. The method generates a text view by probabilistically modeling the location information of infected areas in text using a robust, carefully designed positional probability function. By assigning lesion probabilities to each image region, the infected areas’ spatial information from the text view is explicitly integrated into the segmentation model. Once the text view has been introduced, a unified image encoder can be employed to extract text view features, so that both text and images are mapped into the same space. In addition, a self-supervised constraint based on text-view overlap and feature consistency is proposed to enhance the model’s robustness and semi-supervised capability through feature augmentation. Meanwhile, the newly designed multi-stage knowledge transfer module utilizes a globally enhanced cross-attention mechanism to comprehensively learn the implicit correlations between image features and text-view features, enabling effective knowledge transfer from text-view features to image features. Extensive experiments demonstrate that TVE-Net outperforms both unimodal and multimodal methods in both fully supervised and semi-supervised lung infection segmentation tasks, achieving significant improvements on QaTa-COV19 and MosMedData+ datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.