MvKeTR: Chest CT Report Generation With Multi-View Perception and Knowledge Enhancement.

IF 6.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Journal of Biomedical and Health Informatics Pub Date : 2025-06-19 DOI:10.1109/JBHI.2025.3581289

Xiwei Deng, Xianchun He, Jianfeng Bao, Yudan Zhou, Shuhui Cai, Congbo Cai, Zhong Chen

{"title":"MvKeTR: Chest CT Report Generation With Multi-View Perception and Knowledge Enhancement.","authors":"Xiwei Deng, Xianchun He, Jianfeng Bao, Yudan Zhou, Shuhui Cai, Congbo Cai, Zhong Chen","doi":"10.1109/JBHI.2025.3581289","DOIUrl":null,"url":null,"abstract":"<p><p>CT report generation (CTRG) aims to automatically generate diagnostic reports for 3D volumes, relieving clinicians' workload and improving patient care. Despite clinical value, existing works fail to effectively incorporate diagnostic information from multiple anatomical views and lack related clinical expertise essential for accurate and reliable diagnosis. To resolve these limitations, we propose a novel Multi-view perception Knowledge-enhanced TansfoRmer (MvKeTR) to mimic the diagnostic workflow of clinicians. Just as radiologists first examine CT scans from multiple planes, a Multi-View Perception Aggregator (MVPA) with view-aware attention is proposed to synthesize diagnostic information from multiple anatomical views effectively. Then, inspired by how radiologists further refer to relevant clinical records to guide diagnostic decision-making, a Cross-Modal Knowledge Enhancer (CMKE) is devised to retrieve the most similar reports based on the query volume to incorporate domain knowledge into the diagnosis procedure. Furthermore, instead of traditional MLPs, we employ Kolmogorov-Arnold Networks (KANs) as the fundamental building blocks of both modules, which exhibit superior parameter efficiency and reduced spectral bias to better capture high-frequency components critical for CT interpretation while mitigating overfitting. Extensive experiments on the public CTRG-Chest-548 K dataset demonstrate that our method outpaces prior state-of-the-art (SOTA) models across almost all metrics. The code is available at https://github.com/xiweideng/MvKeTR.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3581289","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

CT report generation (CTRG) aims to automatically generate diagnostic reports for 3D volumes, relieving clinicians' workload and improving patient care. Despite clinical value, existing works fail to effectively incorporate diagnostic information from multiple anatomical views and lack related clinical expertise essential for accurate and reliable diagnosis. To resolve these limitations, we propose a novel Multi-view perception Knowledge-enhanced TansfoRmer (MvKeTR) to mimic the diagnostic workflow of clinicians. Just as radiologists first examine CT scans from multiple planes, a Multi-View Perception Aggregator (MVPA) with view-aware attention is proposed to synthesize diagnostic information from multiple anatomical views effectively. Then, inspired by how radiologists further refer to relevant clinical records to guide diagnostic decision-making, a Cross-Modal Knowledge Enhancer (CMKE) is devised to retrieve the most similar reports based on the query volume to incorporate domain knowledge into the diagnosis procedure. Furthermore, instead of traditional MLPs, we employ Kolmogorov-Arnold Networks (KANs) as the fundamental building blocks of both modules, which exhibit superior parameter efficiency and reduced spectral bias to better capture high-frequency components critical for CT interpretation while mitigating overfitting. Extensive experiments on the public CTRG-Chest-548 K dataset demonstrate that our method outpaces prior state-of-the-art (SOTA) models across almost all metrics. The code is available at https://github.com/xiweideng/MvKeTR.

查看原文本刊更多论文

MvKeTR：基于多视图感知和知识增强的胸部CT报告生成。

CT报告生成（CTRG）旨在自动生成3D卷的诊断报告，减轻临床医生的工作量，改善患者护理。尽管具有临床价值，但现有的工作未能有效地整合多个解剖视图的诊断信息，并且缺乏准确可靠诊断所必需的相关临床专业知识。为了解决这些限制，我们提出了一种新的多视图感知知识增强转换器（MvKeTR）来模拟临床医生的诊断工作流程。就像放射科医生首先从多个平面检查CT扫描一样，提出了一种具有视图感知注意力的多视图感知聚合器（MVPA）来有效地综合来自多个解剖视图的诊断信息。然后，受放射科医生如何进一步参考相关临床记录来指导诊断决策的启发，设计了一个跨模式知识增强器（CMKE），基于查询量检索最相似的报告，将领域知识纳入诊断过程。此外，我们采用Kolmogorov-Arnold网络（KANs）作为两个模块的基本构建模块，而不是传统的mlp，它具有卓越的参数效率和减少的频谱偏差，可以更好地捕获对CT解释至关重要的高频成分，同时减轻过拟合。在公共ctrg - chest - 548k数据集上进行的大量实验表明，我们的方法在几乎所有指标上都超过了先前最先进的（SOTA）模型。代码可在https://github.com/xiweideng/MvKeTR上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

13.60

自引率

6.50%

发文量

1151

期刊介绍： IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.