A survey of deep-learning-based radiology report generation using multimodal inputs

IF 11.8 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-05-13 DOI:10.1016/j.media.2025.103627

Xinyi Wang , Grazziela Figueredo , Ruizhe Li , Wei Emma Zhang , Weitong Chen , Xin Chen

{"title":"A survey of deep-learning-based radiology report generation using multimodal inputs","authors":"Xinyi Wang , Grazziela Figueredo , Ruizhe Li , Wei Emma Zhang , Weitong Chen , Xin Chen","doi":"10.1016/j.media.2025.103627","DOIUrl":null,"url":null,"abstract":"<div><div>Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works have emerged to address this issue using deep-learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep-learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion and interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, we summarize the latest developments in large model-based methods and model explainability, along with public datasets, evaluation methods, current challenges, and future directions in this field. We have also conducted a quantitative comparison between different methods in the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and to assist them in developing new algorithms to advance the field.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103627"},"PeriodicalIF":11.8000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525001744","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works have emerged to address this issue using deep-learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep-learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion and interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, we summarize the latest developments in large model-based methods and model explainability, along with public datasets, evaluation methods, current challenges, and future directions in this field. We have also conducted a quantitative comparison between different methods in the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and to assist them in developing new algorithms to advance the field.

查看原文本刊更多论文

使用多模态输入的基于深度学习的放射学报告生成的调查

自动生成放射学报告可以减轻医生的工作量，减少医疗资源的区域差异，因此成为医学图像分析领域的重要课题。这是一项具有挑战性的任务，因为计算模型需要模拟医生从多模态输入数据（即医学图像、临床信息、医学知识等）中获取信息，并产生全面准确的报告。最近出现了许多使用基于深度学习的方法来解决这个问题的作品，例如变形、对比学习和知识库构建。本文总结了最近研究中发展的关键技术，并提出了一种基于深度学习的报告生成的通用工作流程，其中包括多模态数据采集、数据准备、特征学习、特征融合与交互以及报告生成。强调了每个组件的最先进的方法。此外，我们总结了基于大型模型的方法和模型可解释性的最新发展，以及该领域的公共数据集、评估方法、当前挑战和未来方向。我们还在相同的实验环境下对不同的方法进行了定量比较。这是最新的调查，主要关注放射学报告生成的多模态输入和数据融合。其目的是为对自动临床报告生成和医学图像分析感兴趣的研究人员提供全面而丰富的信息，特别是当使用多模态输入时，并帮助他们开发新的算法来推进该领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.