Bi-VLGM: Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation

IF 11.6 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wenting Chen, Jie Liu, Tianming Liu, Yixuan Yuan
{"title":"Bi-VLGM: Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation","authors":"Wenting Chen, Jie Liu, Tianming Liu, Yixuan Yuan","doi":"10.1007/s11263-024-02246-w","DOIUrl":null,"url":null,"abstract":"<p>Medical reports containing specific diagnostic results and additional information not present in medical images can be effectively employed to assist image understanding tasks, and the modality gap between vision and language can be bridged by vision-language matching (VLM). However, current vision-language models distort the intra-model relation and only include class information in reports that is insufficient for segmentation task. In this paper, we introduce a novel Bi-level class-severity-aware Vision-Language Graph Matching (Bi-VLGM) for text guided medical image segmentation, composed of a word-level VLGM module and a sentence-level VLGM module, to exploit the class-severity-aware relation among visual-textual features. In word-level VLGM, to mitigate the distorted intra-modal relation during VLM, we reformulate VLM as graph matching problem and introduce a vision-language graph matching (VLGM) to exploit the high-order relation among visual-textual features. Then, we perform VLGM between the local features for each class region and class-aware prompts to bridge their gap. In sentence-level VLGM, to provide disease severity information for segmentation task, we introduce a severity-aware prompting to quantify the severity level of disease lesion, and perform VLGM between the global features and the severity-aware prompts. By exploiting the relation between the local (global) and class (severity) features, the segmentation model can include the class-aware and severity-aware information to promote segmentation performance. Extensive experiments proved the effectiveness of our method and its superiority to existing methods. The source code will be released.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"64 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2024-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-024-02246-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Medical reports containing specific diagnostic results and additional information not present in medical images can be effectively employed to assist image understanding tasks, and the modality gap between vision and language can be bridged by vision-language matching (VLM). However, current vision-language models distort the intra-model relation and only include class information in reports that is insufficient for segmentation task. In this paper, we introduce a novel Bi-level class-severity-aware Vision-Language Graph Matching (Bi-VLGM) for text guided medical image segmentation, composed of a word-level VLGM module and a sentence-level VLGM module, to exploit the class-severity-aware relation among visual-textual features. In word-level VLGM, to mitigate the distorted intra-modal relation during VLM, we reformulate VLM as graph matching problem and introduce a vision-language graph matching (VLGM) to exploit the high-order relation among visual-textual features. Then, we perform VLGM between the local features for each class region and class-aware prompts to bridge their gap. In sentence-level VLGM, to provide disease severity information for segmentation task, we introduce a severity-aware prompting to quantify the severity level of disease lesion, and perform VLGM between the global features and the severity-aware prompts. By exploiting the relation between the local (global) and class (severity) features, the segmentation model can include the class-aware and severity-aware information to promote segmentation performance. Extensive experiments proved the effectiveness of our method and its superiority to existing methods. The source code will be released.

Abstract Image

Bi-VLGM:用于文本引导医学图像分割的双级别类严重性感知视觉语言图匹配
包含特定诊断结果和医学图像中不存在的附加信息的医疗报告可有效用于辅助图像理解任务,视觉语言匹配(VLM)可弥合视觉和语言之间的模态差距。然而,目前的视觉语言模型扭曲了模型内部的关系,只包含报告中的类别信息,不足以完成分割任务。在本文中,我们介绍了一种用于文本引导的医学图像分割的新型双级类严重性感知视觉语言图匹配(Bi-VLGM),由词级 VLGM 模块和句子级 VLGM 模块组成,以利用视觉文本特征之间的类严重性感知关系。在词级 VLGM 中,为了减少 VLM 过程中模态内关系的扭曲,我们将 VLM 重新表述为图匹配问题,并引入视觉语言图匹配(VLGM)来利用视觉-文本特征之间的高阶关系。然后,我们在每个类别区域的局部特征和类别感知提示之间进行 VLGM,以弥补它们之间的差距。在句子级 VLGM 中,为了给分割任务提供疾病严重程度信息,我们引入了严重程度感知提示来量化疾病病变的严重程度,并在全局特征和严重程度感知提示之间进行 VLGM。通过利用局部(全局)特征和类别(严重程度)特征之间的关系,分割模型可以包含类别感知和严重程度感知信息,从而提高分割性能。广泛的实验证明了我们方法的有效性及其优于现有方法的优势。我们将公布源代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Computer Vision
International Journal of Computer Vision 工程技术-计算机:人工智能
CiteScore
29.80
自引率
2.10%
发文量
163
审稿时长
6 months
期刊介绍: The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信