Towards robust multimodal ultrasound classification for liver tumor diagnosis: A generative approach to modality missingness

IF 4.9 2区 医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Jiali Guo , Rui Bu , Wanting Shen , Tao Feng
{"title":"Towards robust multimodal ultrasound classification for liver tumor diagnosis: A generative approach to modality missingness","authors":"Jiali Guo ,&nbsp;Rui Bu ,&nbsp;Wanting Shen ,&nbsp;Tao Feng","doi":"10.1016/j.cmpb.2025.108759","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective</h3><div>In medical image analysis, combining multiple imaging modalities enhances diagnostic accuracy by providing complementary information. However, missing modalities are common in clinical settings, limiting the effectiveness of multimodal models. This study addresses the challenge of missing modalities in liver tumor diagnosis by proposing a generative model-based method for cross-modality reconstruction and classification. The dataset for this study comprises 359 case data from a hospital, with each case including three modality data: B-mode ultrasound images, Color Doppler Flow Imaging (CDFI), and clinical data. Only cases with one missing image modality are considered, excluding those with missing clinical data.</div></div><div><h3>Methods</h3><div>We developed a multimodal classification framework specifically for liver tumor diagnosis, employing various feature extraction networks to explore the impact of different modality combinations on classification performance when only available modalities are used. DenseNet extracts CDFI features, while EfficientNet is employed for B-mode ultrasound image feature extraction. These features are then flattened and concatenated with clinical data using feature-level fusion to obtain a full-modality model. Modality weight parameters are introduced to emphasize the importance of different modalities, yielding Model_D, which serves as the classification model after subsequent image modality supplementation. In cases of missing modalities, generative models, including U-GAT-IT and MSA-GAN, are utilized for cross-modal reconstruction of missing B-mode ultrasound or CDFI images (e.g., reconstructing CDFI from B-mode ultrasound when CDFI is missing). After evaluating the usability of the generated images, they are input into Model_D as supplementary images for the missing modalities.</div></div><div><h3>Results</h3><div>Model performance and modality supplementation effects were evaluated through accuracy, precision, recall, F1 score, and AUC metrics. The results demonstrate that the proposed Model_D, which introduces modality weights, achieves an accuracy of 88.57 %, precision of 87.97 %, recall of 82.32 %, F1 score of 0.87, and AUC of 0.95 in the full-modality classification task for liver tumors. Moreover, images reconstructed using U-GAT-IT and MSA-GAN across modalities exhibit PSNR &gt; 20 and multi-scale structural similarity &gt; 0.7, indicating moderate image quality with well-preserved overall structures, suitable for input into the model as supplementary images in cases of missing modalities. The supplementary CDFI or B-mode ultrasound images achieve 87.10 % and 86.43 % accuracy, respectively, with AUC values of 0.92 and 0.95. This proves that even in the absence of certain modalities, the generative models can effectively reconstruct missing images, maintaining high classification performance comparable to that in complete modality scenarios.</div></div><div><h3>Conclusions</h3><div>The generative model-based approach for modality reconstruction significantly improves the robustness of multimodal classification models, particularly in the context of liver tumor diagnosis. This method enhances the clinical applicability of multimodal models by ensuring high diagnostic accuracy despite missing modalities. Future work will explore further improvements in modality reconstruction techniques to increase the generalization and reliability of the model in various clinical settings.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"265 ","pages":"Article 108759"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725001762","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Background and Objective

In medical image analysis, combining multiple imaging modalities enhances diagnostic accuracy by providing complementary information. However, missing modalities are common in clinical settings, limiting the effectiveness of multimodal models. This study addresses the challenge of missing modalities in liver tumor diagnosis by proposing a generative model-based method for cross-modality reconstruction and classification. The dataset for this study comprises 359 case data from a hospital, with each case including three modality data: B-mode ultrasound images, Color Doppler Flow Imaging (CDFI), and clinical data. Only cases with one missing image modality are considered, excluding those with missing clinical data.

Methods

We developed a multimodal classification framework specifically for liver tumor diagnosis, employing various feature extraction networks to explore the impact of different modality combinations on classification performance when only available modalities are used. DenseNet extracts CDFI features, while EfficientNet is employed for B-mode ultrasound image feature extraction. These features are then flattened and concatenated with clinical data using feature-level fusion to obtain a full-modality model. Modality weight parameters are introduced to emphasize the importance of different modalities, yielding Model_D, which serves as the classification model after subsequent image modality supplementation. In cases of missing modalities, generative models, including U-GAT-IT and MSA-GAN, are utilized for cross-modal reconstruction of missing B-mode ultrasound or CDFI images (e.g., reconstructing CDFI from B-mode ultrasound when CDFI is missing). After evaluating the usability of the generated images, they are input into Model_D as supplementary images for the missing modalities.

Results

Model performance and modality supplementation effects were evaluated through accuracy, precision, recall, F1 score, and AUC metrics. The results demonstrate that the proposed Model_D, which introduces modality weights, achieves an accuracy of 88.57 %, precision of 87.97 %, recall of 82.32 %, F1 score of 0.87, and AUC of 0.95 in the full-modality classification task for liver tumors. Moreover, images reconstructed using U-GAT-IT and MSA-GAN across modalities exhibit PSNR > 20 and multi-scale structural similarity > 0.7, indicating moderate image quality with well-preserved overall structures, suitable for input into the model as supplementary images in cases of missing modalities. The supplementary CDFI or B-mode ultrasound images achieve 87.10 % and 86.43 % accuracy, respectively, with AUC values of 0.92 and 0.95. This proves that even in the absence of certain modalities, the generative models can effectively reconstruct missing images, maintaining high classification performance comparable to that in complete modality scenarios.

Conclusions

The generative model-based approach for modality reconstruction significantly improves the robustness of multimodal classification models, particularly in the context of liver tumor diagnosis. This method enhances the clinical applicability of multimodal models by ensuring high diagnostic accuracy despite missing modalities. Future work will explore further improvements in modality reconstruction techniques to increase the generalization and reliability of the model in various clinical settings.
对肝脏肿瘤诊断稳健的多模态超声分类:模态缺失的生成方法
背景与目的在医学图像分析中,结合多种成像方式提供互补信息,提高诊断准确性。然而,缺失模式在临床环境中很常见,限制了多模式模型的有效性。本研究通过提出一种基于生成模型的跨模态重建和分类方法,解决了肝脏肿瘤诊断中缺失模态的挑战。本研究的数据集包括来自一家医院的359例病例数据,每个病例包括三种模式数据:b超图像、彩色多普勒血流成像(CDFI)和临床数据。仅考虑一种缺失图像模态的病例,排除那些缺失临床数据的病例。方法我们开发了一个专门用于肝脏肿瘤诊断的多模态分类框架,利用各种特征提取网络来探索在只使用可用模态时不同模态组合对分类性能的影响。DenseNet提取CDFI特征,EfficientNet提取b超图像特征。然后,这些特征被平面化,并使用特征级融合与临床数据连接,以获得全模态模型。引入模态权重参数来强调不同模态的重要性,得到Model_D,作为后续图像模态补充后的分类模型。在模态缺失的情况下,包括U-GAT-IT和MSA-GAN在内的生成模型被用于缺失的b型超声或CDFI图像的跨模态重建(例如,当CDFI缺失时,从b型超声重建CDFI)。在评估生成的图像的可用性之后,将它们作为缺失模态的补充图像输入Model_D。结果通过准确性、精密度、召回率、F1评分和AUC指标评估模型性能和模态补充效果。结果表明,引入模态权重的Model_D模型在肝脏肿瘤全模态分类任务中准确率为88.57%,精密度为87.97%,召回率为82.32%,F1评分为0.87,AUC为0.95。此外,使用u - gt - it和MSA-GAN跨模态重建的图像显示PSNR >;20和多尺度结构相似度>;0.7,表示图像质量中等,整体结构保存完好,适合在模态缺失的情况下作为补充图像输入模型。辅助的CDFI或b超图像准确率分别达到87.10%和86.43%,AUC值分别为0.92和0.95。这证明了即使在缺少某些模态的情况下,生成模型也可以有效地重建缺失图像,并保持与完全模态场景相当的分类性能。结论基于生成模型的模态重建方法显著提高了多模态分类模型的鲁棒性,特别是在肝脏肿瘤诊断方面。该方法通过确保高诊断准确性来提高多模态模型的临床适用性。未来的工作将探索进一步改进模态重建技术,以提高模型在各种临床环境中的泛化和可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer methods and programs in biomedicine
Computer methods and programs in biomedicine 工程技术-工程:生物医学
CiteScore
12.30
自引率
6.60%
发文量
601
审稿时长
135 days
期刊介绍: To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信