Lexin Fang , Xuemei Li , Yunyang Xu , Fan Zhang , Caiming Zhang
{"title":"Driven by textual knowledge: A Text-View Enhanced Knowledge Transfer Network for lung infection region segmentation","authors":"Lexin Fang , Xuemei Li , Yunyang Xu , Fan Zhang , Caiming Zhang","doi":"10.1016/j.media.2025.103625","DOIUrl":null,"url":null,"abstract":"<div><div>Lung infections are the leading cause of death among infectious diseases, and accurate segmentation of the infected lung area is crucial for effective treatment. Currently, segmentation methods that rely solely on imaging data have limited accuracy. Incorporating text information enriched with expert knowledge into the segmentation process has emerged as a novel approach. However, previous methods often used unified text encoding strategies for extracting textual features. It failed to adequately emphasize critical details in the text, particularly the spatial location of infected regions. Moreover, the semantic space inconsistency between text and image features complicates cross-modal information transfer. To close these gaps, we propose a <strong>Text-View Enhanced Knowledge Transfer Network (TVE-Net)</strong> that leverages key information from textual data to assist in segmentation and enhance the model’s perception of lung infection locations. The method generates a text view by probabilistically modeling the location information of infected areas in text using a robust, carefully designed positional probability function. By assigning lesion probabilities to each image region, the infected areas’ spatial information from the text view is explicitly integrated into the segmentation model. Once the text view has been introduced, a unified image encoder can be employed to extract text view features, so that both text and images are mapped into the same space. In addition, a self-supervised constraint based on text-view overlap and feature consistency is proposed to enhance the model’s robustness and semi-supervised capability through feature augmentation. Meanwhile, the newly designed multi-stage knowledge transfer module utilizes a globally enhanced cross-attention mechanism to comprehensively learn the implicit correlations between image features and text-view features, enabling effective knowledge transfer from text-view features to image features. Extensive experiments demonstrate that TVE-Net outperforms both unimodal and multimodal methods in both fully supervised and semi-supervised lung infection segmentation tasks, achieving significant improvements on QaTa-COV19 and MosMedData+ datasets.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103625"},"PeriodicalIF":11.8000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525001720","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Lung infections are the leading cause of death among infectious diseases, and accurate segmentation of the infected lung area is crucial for effective treatment. Currently, segmentation methods that rely solely on imaging data have limited accuracy. Incorporating text information enriched with expert knowledge into the segmentation process has emerged as a novel approach. However, previous methods often used unified text encoding strategies for extracting textual features. It failed to adequately emphasize critical details in the text, particularly the spatial location of infected regions. Moreover, the semantic space inconsistency between text and image features complicates cross-modal information transfer. To close these gaps, we propose a Text-View Enhanced Knowledge Transfer Network (TVE-Net) that leverages key information from textual data to assist in segmentation and enhance the model’s perception of lung infection locations. The method generates a text view by probabilistically modeling the location information of infected areas in text using a robust, carefully designed positional probability function. By assigning lesion probabilities to each image region, the infected areas’ spatial information from the text view is explicitly integrated into the segmentation model. Once the text view has been introduced, a unified image encoder can be employed to extract text view features, so that both text and images are mapped into the same space. In addition, a self-supervised constraint based on text-view overlap and feature consistency is proposed to enhance the model’s robustness and semi-supervised capability through feature augmentation. Meanwhile, the newly designed multi-stage knowledge transfer module utilizes a globally enhanced cross-attention mechanism to comprehensively learn the implicit correlations between image features and text-view features, enabling effective knowledge transfer from text-view features to image features. Extensive experiments demonstrate that TVE-Net outperforms both unimodal and multimodal methods in both fully supervised and semi-supervised lung infection segmentation tasks, achieving significant improvements on QaTa-COV19 and MosMedData+ datasets.
期刊介绍:
Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.