Michael Asiedu Asare, Isaac Acquah, Benjamin Appiah Yeboah, Emmanuel Owusu
{"title":"Resilient Sinkhorn-Based Optimal Transport Late Fusion Framework for Breast Cancer Diagnosis.","authors":"Michael Asiedu Asare, Isaac Acquah, Benjamin Appiah Yeboah, Emmanuel Owusu","doi":"10.1177/11769351261420789","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This research aims to develop and evaluate a clinically deployable multimodal deep learning framework for breast cancer diagnosis that maintains robustness, even when clinical data are asynchronous, unpaired, or incomplete, effectively addressing real-world challenges related to data heterogeneity and fragmented clinical workflows.</p><p><strong>Methods: </strong>In this retrospective study, a multimodal deep learning architecture was developed that integrates histopathological images with structured clinical risk factors. Custom models were developed and independently trained for each modality, and late fusion was achieved via a dynamically reweighted Sinkhorn-based fusion layer. Model performance was evaluated using precision-recall Area Under Curve (PR-AUC), recall, <i>F</i>1 score, and Brier score under complete and partial modality availability scenarios. Robustness and clinical utility were further assessed through statistical significance testing and decision curve analysis (DCA). Additionally, we employed a Sinkhorn cost matrix to enhance interpretability.</p><p><strong>Results: </strong>The proposed Sinkhorn fusion model outperformed all baseline methods, achieving the highest recall (0.96), PR-AUC (0.775), <i>F</i>1 score (0.828), and the best calibration (Brier score ≈ 0.19). Notably, it maintained perfect recall (1.00) under a 50% simulated modality dropout, despite a significant drop in PR-AUC (20% vs 0%: <i>t</i> = -20.35, <i>P</i> < .0001; 50% vs 0%: <i>t</i> = 88.60, <i>P</i> < .0001), portraying a strong overall robustness to information missingness. Under internally controlled conditions, DCA demonstrated superior clinical utility across thresholds of 0.2 to 0.7.</p><p><strong>Conclusions: </strong>The model's ability to accommodate unpaired and incomplete clinical inputs while maintaining both calibration and sensitivity makes it particularly well-suited for deployment in asynchronous and resource-constrained settings. Its consistent performance under clinical uncertainty and minimal preprocessing requirements represents a significant advancement toward equitable, reliable, and scalable AI-assisted breast cancer screening. To our knowledge, this is the first paper to model breast cancer late fusion as an optimal transport problem.</p>","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"25 ","pages":"11769351261420789"},"PeriodicalIF":2.5000,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12929828/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11769351261420789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This research aims to develop and evaluate a clinically deployable multimodal deep learning framework for breast cancer diagnosis that maintains robustness, even when clinical data are asynchronous, unpaired, or incomplete, effectively addressing real-world challenges related to data heterogeneity and fragmented clinical workflows.
Methods: In this retrospective study, a multimodal deep learning architecture was developed that integrates histopathological images with structured clinical risk factors. Custom models were developed and independently trained for each modality, and late fusion was achieved via a dynamically reweighted Sinkhorn-based fusion layer. Model performance was evaluated using precision-recall Area Under Curve (PR-AUC), recall, F1 score, and Brier score under complete and partial modality availability scenarios. Robustness and clinical utility were further assessed through statistical significance testing and decision curve analysis (DCA). Additionally, we employed a Sinkhorn cost matrix to enhance interpretability.
Results: The proposed Sinkhorn fusion model outperformed all baseline methods, achieving the highest recall (0.96), PR-AUC (0.775), F1 score (0.828), and the best calibration (Brier score ≈ 0.19). Notably, it maintained perfect recall (1.00) under a 50% simulated modality dropout, despite a significant drop in PR-AUC (20% vs 0%: t = -20.35, P < .0001; 50% vs 0%: t = 88.60, P < .0001), portraying a strong overall robustness to information missingness. Under internally controlled conditions, DCA demonstrated superior clinical utility across thresholds of 0.2 to 0.7.
Conclusions: The model's ability to accommodate unpaired and incomplete clinical inputs while maintaining both calibration and sensitivity makes it particularly well-suited for deployment in asynchronous and resource-constrained settings. Its consistent performance under clinical uncertainty and minimal preprocessing requirements represents a significant advancement toward equitable, reliable, and scalable AI-assisted breast cancer screening. To our knowledge, this is the first paper to model breast cancer late fusion as an optimal transport problem.
目的:本研究旨在开发和评估用于乳腺癌诊断的临床可部署的多模态深度学习框架,即使在临床数据异步、不匹配或不完整的情况下,也能保持稳健性,有效解决与数据异质性和临床工作流程碎片化相关的现实挑战。方法:在这项回顾性研究中,开发了一种多模式深度学习架构,将组织病理学图像与结构化临床危险因素相结合。针对每种模式开发和独立训练定制模型,并通过动态重加权的基于sinkhorn的融合层实现后期融合。采用曲线下召回率(PR-AUC)、召回率、F1评分和Brier评分来评估模型在完全和部分模态可用性情景下的性能。通过统计显著性检验和决策曲线分析(DCA)进一步评估稳健性和临床实用性。此外,我们采用了一个Sinkhorn成本矩阵来提高可解释性。结果:所提出的Sinkhorn融合模型优于所有基线方法,达到最高召回率(0.96),PR-AUC (0.775), F1评分(0.828)和最佳校准(Brier评分≈0.19)。值得注意的是,尽管PR-AUC显著下降(20% vs 0%: t = -20.35, P = 88.60, P),该模型在保持校准和灵敏度的同时适应未配对和不完整的临床输入的能力,使其特别适合在异步和资源受限的环境中部署。它在临床不确定性和最小预处理要求下的一致表现代表了在公平、可靠和可扩展的人工智能辅助乳腺癌筛查方面取得的重大进展。据我们所知,这是第一篇将乳腺癌晚期融合模型作为最优转移问题的论文。
期刊介绍:
The field of cancer research relies on advances in many other disciplines, including omics technology, mass spectrometry, radio imaging, computer science, and biostatistics. Cancer Informatics provides open access to peer-reviewed high-quality manuscripts reporting bioinformatics analysis of molecular genetics and/or clinical data pertaining to cancer, emphasizing the use of machine learning, artificial intelligence, statistical algorithms, advanced imaging techniques, data visualization, and high-throughput technologies. As the leading journal dedicated exclusively to the report of the use of computational methods in cancer research and practice, Cancer Informatics leverages methodological improvements in systems biology, genomics, proteomics, metabolomics, and molecular biochemistry into the fields of cancer detection, treatment, classification, risk-prediction, prevention, outcome, and modeling.