Missing-modality enabled multi-modal fusion architecture for medical data

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics Pub Date : 2025-02-21 DOI:10.1016/j.jbi.2025.104796

Muyu Wang , Shiyu Fan , Yichen Li , Zhongrang Xie , Hui Chen

{"title":"Missing-modality enabled multi-modal fusion architecture for medical data","authors":"Muyu Wang , Shiyu Fan , Yichen Li , Zhongrang Xie , Hui Chen","doi":"10.1016/j.jbi.2025.104796","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Fusion of multi-modal data can improve the performance of deep learning models. However, missing modalities are common in medical data due to patient specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities.</div></div><div><h3>Objective</h3><div>This study aimed to develop an effective multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance for clinical tasks.</div></div><div><h3>Methods</h3><div>X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve models’ robustness to missing modalities during the inference process. Finally, we designed comparison and ablation experiments to validate the effectiveness of the fusion, the robustness to missing modalities, and the enhancements from each key component. Experiments were conducted on MIMIC-IV and MIMIC-CXR datasets with the 14-label disease diagnosis and patient in-hospital mortality prediction task The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used to evaluate models’ performance.</div></div><div><h3>Results</h3><div>Our proposed architecture showed superior predictive performance, achieving the average AUROC and AUPRC of 0.916 and 0.551 in the 14-label classification task, 0.816 and 0.392 in the mortality prediction task. while the best average AUROC and AUPRC among the comparison methods were 0.876, 0.492 in the 14-label classification task and 0.806, 0.366 in the mortality prediction task. Both metrics decreased only slightly when tested with modal-incomplete data. Different levels of enhancements were achieved through three key components.</div></div><div><h3>Conclusions</h3><div>The proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This architecture holds promise for scaling up to more modalities to enhance the clinical practicality of the model.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104796"},"PeriodicalIF":4.0000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000255","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Fusion of multi-modal data can improve the performance of deep learning models. However, missing modalities are common in medical data due to patient specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities.

Objective

This study aimed to develop an effective multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance for clinical tasks.

Methods

X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve models’ robustness to missing modalities during the inference process. Finally, we designed comparison and ablation experiments to validate the effectiveness of the fusion, the robustness to missing modalities, and the enhancements from each key component. Experiments were conducted on MIMIC-IV and MIMIC-CXR datasets with the 14-label disease diagnosis and patient in-hospital mortality prediction task The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used to evaluate models’ performance.

Results

Our proposed architecture showed superior predictive performance, achieving the average AUROC and AUPRC of 0.916 and 0.551 in the 14-label classification task, 0.816 and 0.392 in the mortality prediction task. while the best average AUROC and AUPRC among the comparison methods were 0.876, 0.492 in the 14-label classification task and 0.806, 0.366 in the mortality prediction task. Both metrics decreased only slightly when tested with modal-incomplete data. Different levels of enhancements were achieved through three key components.

Conclusions

The proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This architecture holds promise for scaling up to more modalities to enhance the clinical practicality of the model.

Abstract Image

查看原文本刊更多论文

支持缺失模态的医疗数据多模态融合架构

多模态数据的融合可以提高深度学习模型的性能。然而，由于患者的特异性，缺少模式在医疗数据中很常见，这不利于多模式模型在应用中的表现。因此，使模型适应缺失的模态是至关重要的。目的本研究旨在开发一种有效的多模态医疗数据融合架构，该架构对缺失模态具有鲁棒性，并进一步提高临床任务的性能。方法将x线胸片作为图像模式，放射学报告作为文本模式，结构化值数据作为表格数据模式进行融合。每个模态对与基于transformer的双模态融合模块融合，然后将三个双模态融合模块组合成一个三模态融合框架。此外，在训练过程中引入多变量损失函数，以提高模型在推理过程中对缺失模态的鲁棒性。最后，我们设计了对比和消融实验来验证融合的有效性、对缺失模态的鲁棒性以及每个关键组件的增强。在MIMIC-IV和MIMIC-CXR数据集上进行14标签疾病诊断和患者住院死亡率预测任务的实验，采用受试者工作特征曲线下面积（AUROC）和精确召回率曲线下面积（AUPRC）来评价模型的性能。结果我们提出的结构具有较好的预测性能，14标签分类任务的平均AUROC和AUPRC分别为0.916和0.551，死亡率预测任务的平均AUROC和AUPRC分别为0.816和0.392。14标签分类任务的平均AUROC和AUPRC为0.876、0.492，死亡率预测任务的平均AUROC和AUPRC为0.806、0.366。当使用模态不完全数据进行测试时，这两个指标仅略有下降。通过三个关键组件实现了不同级别的增强。结论所提出的多模态融合架构能有效融合三模态，对缺失模态具有较强的鲁棒性。这种结构有望扩大到更多的模式，以提高模型的临床实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.