Spatio-Temporal and Retrieval-Augmented Modeling for Chest X-Ray Report Generation

IEEE transactions on medical imaging Pub Date : 2025-03-25 DOI:10.1109/TMI.2025.3554498

Yan Yang;Xiaoxing You;Ke Zhang;Zhenqi Fu;Xianyun Wang;Jiajun Ding;Jiamei Sun;Zhou Yu;Qingming Huang;Weidong Han;Jun Yu

{"title":"Spatio-Temporal and Retrieval-Augmented Modeling for Chest X-Ray Report Generation","authors":"Yan Yang;Xiaoxing You;Ke Zhang;Zhenqi Fu;Xianyun Wang;Jiajun Ding;Jiamei Sun;Zhou Yu;Qingming Huang;Weidong Han;Jun Yu","doi":"10.1109/TMI.2025.3554498","DOIUrl":null,"url":null,"abstract":"Chest X-ray report generation has attracted increasing research attention. However, most existing methods neglect the temporal information and typically generate reports conditioned on a fixed number of images. In this paper, we propose STREAM: Spatio-Temporal and REtrieval-Augmented Modelling for automatic chest X-ray report generation. It mimics clinical diagnosis by integrating current and historical studies to interpret the present condition (temporal), with each study containing images from multi-views (spatial). Concretely, our STREAM is built upon an encoder-decoder architecture, utilizing a large language model (LLM) as the decoder. Overall, spatio-temporal visual dynamics are packed as visual prompts and regional semantic entities are retrieved as textual prompts. First, a token packer is proposed to capture condensed spatio-temporal visual dynamics, enabling the flexible fusion of images from current and historical studies. Second, to augment the generation with existing knowledge and regional details, a progressive semantic retriever is proposed to retrieve semantic entities from a preconstructed knowledge bank as heuristic text prompts. The knowledge bank is constructed to encapsulate anatomical chest X-ray knowledge into structured entities, each linked to a specific chest region. Extensive experiments on public datasets have shown the state-of-the-art performance of our method. Related codes and the knowledge bank are available at <uri>https://github.com/yangyan22/STREAM</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 7","pages":"2892-2905"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10938723/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Chest X-ray report generation has attracted increasing research attention. However, most existing methods neglect the temporal information and typically generate reports conditioned on a fixed number of images. In this paper, we propose STREAM: Spatio-Temporal and REtrieval-Augmented Modelling for automatic chest X-ray report generation. It mimics clinical diagnosis by integrating current and historical studies to interpret the present condition (temporal), with each study containing images from multi-views (spatial). Concretely, our STREAM is built upon an encoder-decoder architecture, utilizing a large language model (LLM) as the decoder. Overall, spatio-temporal visual dynamics are packed as visual prompts and regional semantic entities are retrieved as textual prompts. First, a token packer is proposed to capture condensed spatio-temporal visual dynamics, enabling the flexible fusion of images from current and historical studies. Second, to augment the generation with existing knowledge and regional details, a progressive semantic retriever is proposed to retrieve semantic entities from a preconstructed knowledge bank as heuristic text prompts. The knowledge bank is constructed to encapsulate anatomical chest X-ray knowledge into structured entities, each linked to a specific chest region. Extensive experiments on public datasets have shown the state-of-the-art performance of our method. Related codes and the knowledge bank are available at https://github.com/yangyan22/STREAM.

查看原文本刊更多论文

胸部x光报告生成的时空和检索增强模型。

胸部x线报告的生成已引起越来越多的研究关注。然而，大多数现有的方法忽略了时间信息，通常生成的报告以固定数量的图像为条件。在本文中，我们提出了流：时空和检索增强模型的自动胸部x线报告生成。它通过整合当前和历史研究来解释当前状况（时间）来模拟临床诊断，每个研究都包含来自多个视图（空间）的图像。具体地说，我们的STREAM是建立在一个编码器-解码器架构之上，利用一个大型语言模型（LLM）作为解码器。总的来说，时空视觉动态被封装为视觉提示，区域语义实体被检索为文本提示。首先，提出了一个标记封装器来捕获压缩的时空视觉动态，从而实现当前和历史研究图像的灵活融合。其次，为了增强现有知识和区域细节的生成，提出了一种渐进式语义检索器，以启发式文本提示的形式从预先构建的知识库中检索语义实体。知识库的构建是为了将解剖学胸片知识封装到结构化实体中，每个实体都与特定的胸部区域相关联。在公共数据集上进行的大量实验表明，我们的方法具有最先进的性能。相关代码和知识库可在https://github.com/yangyan22/STREAM上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on medical imaging

自引率

0.00%

发文量