Knowledge fusion in deep learning-based medical vision-language models: A review

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Dexuan Xu , Yanyuan Chen , Zhongyan Chai , Yifan Xiao , Yandong Yan , Weiping Ding , Hanpin Wang , Zhi Jin , Wenpin Jiao , Weihua Yue , Hang Li , Yu Huang
{"title":"Knowledge fusion in deep learning-based medical vision-language models: A review","authors":"Dexuan Xu ,&nbsp;Yanyuan Chen ,&nbsp;Zhongyan Chai ,&nbsp;Yifan Xiao ,&nbsp;Yandong Yan ,&nbsp;Weiping Ding ,&nbsp;Hanpin Wang ,&nbsp;Zhi Jin ,&nbsp;Wenpin Jiao ,&nbsp;Weihua Yue ,&nbsp;Hang Li ,&nbsp;Yu Huang","doi":"10.1016/j.inffus.2025.103455","DOIUrl":null,"url":null,"abstract":"<div><div>Medical vision-language models based on deep learning can automatically extract image features and fuse them with text information, which has promoted the rapid development of multimodal medical artificial intelligence. However, the complexity of the medical field requires the model to have a deep professional knowledge background. Therefore, knowledge fusion technology provides a new idea for solving medical vision-language tasks. Different from the existing reviews, this paper systematically sorts out the knowledge fusion methods in medical vision-language models from two unique perspectives: the stage characteristics of knowledge fusion and the task-oriented fusion strategy, and provides a new theoretical framework for research in the field. Firstly, this paper introduces the classification of medical knowledge and its applicable scenarios in detail. Subsequently, we systematically discuss the knowledge fusion algorithm based on deep learning and summarize the four different knowledge fusion stages (data construction, pretraining, feature representation and inference) in the medical vision-language model. In addition, this paper comprehensively analyzes the specific strategies of knowledge fusion in five types of medical vision-language tasks (medical report generation, medical visual question answering, medical language-guided segmentation, medical multimodal pretraining, and multimodal large language model), and summarizes the evaluation methods based on knowledge fusion in detail. Finally, we summarize future research directions, including enhanced interpretability, mixture-of-experts models, knowledge editing, etc., aiming to provide researchers with references that have both theoretical value and practical significance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103455"},"PeriodicalIF":15.5000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525005287","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Medical vision-language models based on deep learning can automatically extract image features and fuse them with text information, which has promoted the rapid development of multimodal medical artificial intelligence. However, the complexity of the medical field requires the model to have a deep professional knowledge background. Therefore, knowledge fusion technology provides a new idea for solving medical vision-language tasks. Different from the existing reviews, this paper systematically sorts out the knowledge fusion methods in medical vision-language models from two unique perspectives: the stage characteristics of knowledge fusion and the task-oriented fusion strategy, and provides a new theoretical framework for research in the field. Firstly, this paper introduces the classification of medical knowledge and its applicable scenarios in detail. Subsequently, we systematically discuss the knowledge fusion algorithm based on deep learning and summarize the four different knowledge fusion stages (data construction, pretraining, feature representation and inference) in the medical vision-language model. In addition, this paper comprehensively analyzes the specific strategies of knowledge fusion in five types of medical vision-language tasks (medical report generation, medical visual question answering, medical language-guided segmentation, medical multimodal pretraining, and multimodal large language model), and summarizes the evaluation methods based on knowledge fusion in detail. Finally, we summarize future research directions, including enhanced interpretability, mixture-of-experts models, knowledge editing, etc., aiming to provide researchers with references that have both theoretical value and practical significance.
基于深度学习的医学视觉语言模型中的知识融合研究进展
基于深度学习的医学视觉语言模型可以自动提取图像特征并与文本信息融合,促进了多模态医学人工智能的快速发展。然而,医学领域的复杂性要求模型具有深厚的专业知识背景。因此,知识融合技术为解决医学视觉语言任务提供了新的思路。与已有文献不同,本文从知识融合的阶段特征和任务导向的融合策略两个独特的视角对医学视觉语言模型中的知识融合方法进行了系统梳理,为该领域的研究提供了新的理论框架。本文首先详细介绍了医学知识的分类及其应用场景。随后,我们系统地讨论了基于深度学习的知识融合算法,总结了医学视觉语言模型中四个不同的知识融合阶段(数据构建、预训练、特征表示和推理)。此外,本文全面分析了五类医学视觉语言任务(医学报告生成、医学视觉问答、医学语言引导分割、医学多模态预训练、多模态大语言模型)中知识融合的具体策略,并详细总结了基于知识融合的评价方法。最后,总结了未来的研究方向,包括增强可解释性、专家混合模型、知识编辑等,旨在为研究者提供既有理论价值又有现实意义的参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信