Enhancing the vision–language foundation model with key semantic knowledge-emphasized report refinement

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2024-08-13 DOI:10.1016/j.media.2024.103299

Weijian Huang , Cheng Li , Hao Yang , Jiarun Liu , Yong Liang , Hairong Zheng , Shanshan Wang

{"title":"Enhancing the vision–language foundation model with key semantic knowledge-emphasized report refinement","authors":"Weijian Huang , Cheng Li , Hao Yang , Jiarun Liu , Yong Liang , Hairong Zheng , Shanshan Wang","doi":"10.1016/j.media.2024.103299","DOIUrl":null,"url":null,"abstract":"<div><p>Recently, vision–language representation learning has made remarkable advancements in building up medical foundation models, holding immense potential for transforming the landscape of clinical research and medical care. The underlying hypothesis is that the rich knowledge embedded in radiology reports can effectively assist and guide the learning process, reducing the need for additional labels. However, these reports tend to be complex and sometimes even consist of redundant descriptions that make the representation learning too challenging to capture the key semantic information. This paper develops a novel iterative vision–language representation learning framework by proposing a key semantic knowledge-emphasized report refinement method. Particularly, raw radiology reports are refined to highlight the key information according to a constructed clinical dictionary and two model-optimized knowledge-enhancement metrics. The iterative framework is designed to progressively learn, starting from gaining a general understanding of the patient’s condition based on raw reports and gradually refines and extracts critical information essential to the fine-grained analysis tasks. The effectiveness of the proposed framework is validated on various downstream medical image analysis tasks, including disease classification, region-of-interest segmentation, and phrase grounding. Our framework surpasses seven state-of-the-art methods in both fine-tuning and zero-shot settings, demonstrating its encouraging potential for different clinical applications.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"97 ","pages":"Article 103299"},"PeriodicalIF":10.7000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S136184152400224X","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, vision–language representation learning has made remarkable advancements in building up medical foundation models, holding immense potential for transforming the landscape of clinical research and medical care. The underlying hypothesis is that the rich knowledge embedded in radiology reports can effectively assist and guide the learning process, reducing the need for additional labels. However, these reports tend to be complex and sometimes even consist of redundant descriptions that make the representation learning too challenging to capture the key semantic information. This paper develops a novel iterative vision–language representation learning framework by proposing a key semantic knowledge-emphasized report refinement method. Particularly, raw radiology reports are refined to highlight the key information according to a constructed clinical dictionary and two model-optimized knowledge-enhancement metrics. The iterative framework is designed to progressively learn, starting from gaining a general understanding of the patient’s condition based on raw reports and gradually refines and extracts critical information essential to the fine-grained analysis tasks. The effectiveness of the proposed framework is validated on various downstream medical image analysis tasks, including disease classification, region-of-interest segmentation, and phrase grounding. Our framework surpasses seven state-of-the-art methods in both fine-tuning and zero-shot settings, demonstrating its encouraging potential for different clinical applications.

Abstract Image

查看原文本刊更多论文

利用关键语义知识强化视觉语言基础模型--强调报告提炼。

最近，视觉语言表征学习在建立医学基础模型方面取得了令人瞩目的进展，为改变临床研究和医疗护理的面貌带来了巨大的潜力。其基本假设是，蕴含在放射学报告中的丰富知识可以有效地帮助和指导学习过程，减少对额外标签的需求。然而，这些报告往往非常复杂，有时甚至包含冗余描述，使得表征学习在捕捉关键语义信息方面面临巨大挑战。本文通过提出一种强调关键语义知识的报告细化方法，开发了一种新颖的迭代视觉语言表征学习框架。特别是，根据构建的临床词典和两个模型优化的知识增强指标，对原始放射学报告进行提炼，以突出关键信息。迭代框架旨在逐步学习，从基于原始报告获得对患者病情的总体了解开始，逐步提炼和提取对细粒度分析任务至关重要的关键信息。我们在各种下游医疗图像分析任务（包括疾病分类、兴趣区域分割和短语接地）中验证了所提框架的有效性。我们的框架在微调和零镜头设置方面都超过了七种最先进的方法，证明了它在不同临床应用中令人鼓舞的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.