What are you looking at? Modality contribution in multimodal medical deep learning.

IF 2.3 3区医学 Q3 ENGINEERING, BIOMEDICAL

International Journal of Computer Assisted Radiology and Surgery Pub Date : 2025-10-02 DOI:10.1007/s11548-025-03523-w

Christian Gapp, Elias Tappeiner, Martin Welk, Karl Fritscher, Elke R Gizewski, Rainer Schubert

{"title":"What are you looking at? Modality contribution in multimodal medical deep learning.","authors":"Christian Gapp, Elias Tappeiner, Martin Welk, Karl Fritscher, Elke R Gizewski, Rainer Schubert","doi":"10.1007/s11548-025-03523-w","DOIUrl":null,"url":null,"abstract":"Purpose: High dimensional, multimodal data can nowadays be analyzed by huge deep neural networks with little effort. Several fusion methods for bringing together different modalities have been developed. Given the prevalence of high-dimensional, multimodal patient data in medicine, the development of multimodal models marks a significant advancement. However, how these models process information from individual sources in detail is still underexplored.Methods: To this end, we implemented an occlusion-based modality contribution method that is both model- and performance agnostic. This method quantitatively measures the importance of each modality in the dataset for the model to fulfill its task. We applied our method to three different multimodal medical problems for experimental purposes.Results: Herein we found that some networks have modality preferences that tend to unimodal collapses, while some datasets are imbalanced from the ground up. Moreover, we provide fine-grained quantitative and visual attribute importance for each modality.Conclusion: Our metric offers valuable insights that can support the advancement of multimodal model development and dataset creation. By introducing this method, we contribute to the growing field of interpretability in deep learning for multimodal research. This approach helps to facilitate the integration of multimodal AI into clinical practice. Our code is publicly available at https://github.com/ChristianGappGit/MC_MMD.","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03523-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: High dimensional, multimodal data can nowadays be analyzed by huge deep neural networks with little effort. Several fusion methods for bringing together different modalities have been developed. Given the prevalence of high-dimensional, multimodal patient data in medicine, the development of multimodal models marks a significant advancement. However, how these models process information from individual sources in detail is still underexplored.

Methods: To this end, we implemented an occlusion-based modality contribution method that is both model- and performance agnostic. This method quantitatively measures the importance of each modality in the dataset for the model to fulfill its task. We applied our method to three different multimodal medical problems for experimental purposes.

Results: Herein we found that some networks have modality preferences that tend to unimodal collapses, while some datasets are imbalanced from the ground up. Moreover, we provide fine-grained quantitative and visual attribute importance for each modality.

Conclusion: Our metric offers valuable insights that can support the advancement of multimodal model development and dataset creation. By introducing this method, we contribute to the growing field of interpretability in deep learning for multimodal research. This approach helps to facilitate the integration of multimodal AI into clinical practice. Our code is publicly available at https://github.com/ChristianGappGit/MC_MMD.

查看原文本刊更多论文

你在看什么？模态对多模态医学深度学习的贡献。

目的：高维、多模态的数据可以用庞大的深度神经网络进行分析。已经开发了几种将不同模式结合在一起的融合方法。鉴于医学中高维、多模态患者数据的流行，多模态模型的发展标志着一个重大的进步。然而，这些模型如何处理来自单个来源的详细信息仍未得到充分探讨。方法：为此，我们实现了一种基于遮挡的模态贡献方法，该方法与模型和性能无关。该方法定量度量数据集中每个模态对模型完成任务的重要性。为了实验目的，我们将我们的方法应用于三个不同的多模式医学问题。结果：在这里，我们发现一些网络具有倾向于单峰崩溃的模态偏好，而一些数据集从头开始是不平衡的。此外，我们为每个模态提供了细粒度的定量和可视化属性重要性。结论：我们的度量提供了有价值的见解，可以支持多模态模型开发和数据集创建的进步。通过引入这种方法，我们为多模态深度学习研究的可解释性领域做出了贡献。这种方法有助于促进将多模式人工智能整合到临床实践中。我们的代码可以在https://github.com/ChristianGappGit/MC_MMD上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

5.90

自引率

6.70%

发文量

243

审稿时长

6-12 weeks

期刊介绍： The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.