Muyu Wang , Shiyu Fan , Yichen Li , Zhongrang Xie , Hui Chen
{"title":"Missing-modality enabled multi-modal fusion architecture for medical data","authors":"Muyu Wang , Shiyu Fan , Yichen Li , Zhongrang Xie , Hui Chen","doi":"10.1016/j.jbi.2025.104796","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Fusion of multi-modal data can improve the performance of deep learning models. However, missing modalities are common in medical data due to patient specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities.</div></div><div><h3>Objective</h3><div>This study aimed to develop an effective multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance for clinical tasks.</div></div><div><h3>Methods</h3><div>X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve models’ robustness to missing modalities during the inference process. Finally, we designed comparison and ablation experiments to validate the effectiveness of the fusion, the robustness to missing modalities, and the enhancements from each key component. Experiments were conducted on MIMIC-IV and MIMIC-CXR datasets with the 14-label disease diagnosis and patient in-hospital mortality prediction task The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used to evaluate models’ performance.</div></div><div><h3>Results</h3><div>Our proposed architecture showed superior predictive performance, achieving the average AUROC and AUPRC of 0.916 and 0.551 in the 14-label classification task, 0.816 and 0.392 in the mortality prediction task. while the best average AUROC and AUPRC among the comparison methods were 0.876, 0.492 in the 14-label classification task and 0.806, 0.366 in the mortality prediction task. Both metrics decreased only slightly when tested with modal-incomplete data. Different levels of enhancements were achieved through three key components.</div></div><div><h3>Conclusions</h3><div>The proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This architecture holds promise for scaling up to more modalities to enhance the clinical practicality of the model.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104796"},"PeriodicalIF":4.0000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000255","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Fusion of multi-modal data can improve the performance of deep learning models. However, missing modalities are common in medical data due to patient specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities.
Objective
This study aimed to develop an effective multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance for clinical tasks.
Methods
X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve models’ robustness to missing modalities during the inference process. Finally, we designed comparison and ablation experiments to validate the effectiveness of the fusion, the robustness to missing modalities, and the enhancements from each key component. Experiments were conducted on MIMIC-IV and MIMIC-CXR datasets with the 14-label disease diagnosis and patient in-hospital mortality prediction task The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used to evaluate models’ performance.
Results
Our proposed architecture showed superior predictive performance, achieving the average AUROC and AUPRC of 0.916 and 0.551 in the 14-label classification task, 0.816 and 0.392 in the mortality prediction task. while the best average AUROC and AUPRC among the comparison methods were 0.876, 0.492 in the 14-label classification task and 0.806, 0.366 in the mortality prediction task. Both metrics decreased only slightly when tested with modal-incomplete data. Different levels of enhancements were achieved through three key components.
Conclusions
The proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This architecture holds promise for scaling up to more modalities to enhance the clinical practicality of the model.
期刊介绍:
The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.