MMsurv: a multimodal multi-instance multi-cancer survival prediction model integrating pathological images, clinical information, and sequencing data.

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-05-01 DOI:10.1093/bib/bbaf209

Hailong Yang, Jia Wang, Wenyan Wang, Shufang Shi, Lijing Liu, Yuhua Yao, Geng Tian, Peizhen Wang, Jialiang Yang

{"title":"MMsurv: a multimodal multi-instance multi-cancer survival prediction model integrating pathological images, clinical information, and sequencing data.","authors":"Hailong Yang, Jia Wang, Wenyan Wang, Shufang Shi, Lijing Liu, Yuhua Yao, Geng Tian, Peizhen Wang, Jialiang Yang","doi":"10.1093/bib/bbaf209","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate prediction of patient survival rates in cancer treatment is essential for effective therapeutic planning. Unfortunately, current models often underutilize the extensive multimodal data available, affecting confidence in predictions. This study presents MMSurv, an interpretable multimodal deep learning model to predict survival in different types of cancer. MMSurv integrates clinical information, sequencing data, and hematoxylin and eosin-stained whole-slide images (WSIs) to forecast patient survival. Specifically, we segment tumor regions from WSIs into image tiles and employ neural networks to encode each tile into one-dimensional feature vectors. We then optimize clinical features by applying word embedding techniques, inspired by natural language processing, to the clinical data. To better utilize the complementarity of multimodal data, this study proposes a novel fusion method, multimodal fusion method based on compact bilinear pooling and transformer, which integrates bilinear pooling with Transformer architecture. The fused features are then processed through a dual-layer multi-instance learning model to remove prognosis-irrelevant image patches and predict each patient's survival risk. Furthermore, we employ cell segmentation to investigate the cellular composition within the tiles that received high attention from the model, thereby enhancing its interpretive capacity. We evaluate our approach on six cancer types from The Cancer Genome Atlas. The results demonstrate that utilizing multimodal data leads to higher predictive accuracy compared to using single-modal image data, with an average C-index increase from 0.6750 to 0.7283. Additionally, we compare our proposed baseline model with state-of-the-art methods using the C-index and five-fold cross-validation approach, revealing a significant average improvement of nearly 10% in our model's performance.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077396/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf209","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate prediction of patient survival rates in cancer treatment is essential for effective therapeutic planning. Unfortunately, current models often underutilize the extensive multimodal data available, affecting confidence in predictions. This study presents MMSurv, an interpretable multimodal deep learning model to predict survival in different types of cancer. MMSurv integrates clinical information, sequencing data, and hematoxylin and eosin-stained whole-slide images (WSIs) to forecast patient survival. Specifically, we segment tumor regions from WSIs into image tiles and employ neural networks to encode each tile into one-dimensional feature vectors. We then optimize clinical features by applying word embedding techniques, inspired by natural language processing, to the clinical data. To better utilize the complementarity of multimodal data, this study proposes a novel fusion method, multimodal fusion method based on compact bilinear pooling and transformer, which integrates bilinear pooling with Transformer architecture. The fused features are then processed through a dual-layer multi-instance learning model to remove prognosis-irrelevant image patches and predict each patient's survival risk. Furthermore, we employ cell segmentation to investigate the cellular composition within the tiles that received high attention from the model, thereby enhancing its interpretive capacity. We evaluate our approach on six cancer types from The Cancer Genome Atlas. The results demonstrate that utilizing multimodal data leads to higher predictive accuracy compared to using single-modal image data, with an average C-index increase from 0.6750 to 0.7283. Additionally, we compare our proposed baseline model with state-of-the-art methods using the C-index and five-fold cross-validation approach, revealing a significant average improvement of nearly 10% in our model's performance.

查看原文本刊更多论文

MMsurv：一个整合病理图像、临床信息和测序数据的多模式、多实例、多癌症生存预测模型。

癌症治疗中患者生存率的准确预测对于有效的治疗计划至关重要。不幸的是，目前的模型往往没有充分利用现有的大量多模态数据，影响了预测的可信度。本研究提出MMSurv，一个可解释的多模态深度学习模型，用于预测不同类型癌症的生存。MMSurv整合了临床信息、测序数据、苏木精和伊红染色的全片图像（WSIs）来预测患者的生存。具体而言，我们将肿瘤区域从wsi中分割成图像块，并使用神经网络将每个块编码为一维特征向量。然后，我们通过将受自然语言处理启发的词嵌入技术应用于临床数据来优化临床特征。为了更好地利用多模态数据的互补性，本研究提出了一种新的融合方法——基于紧凑双线性池和变压器的多模态融合方法，该方法将双线性池与transformer架构相结合。然后通过双层多实例学习模型对融合的特征进行处理，去除与预后无关的图像补丁，并预测每个患者的生存风险。此外，我们使用细胞分割来研究受到模型高度关注的瓷砖内的细胞组成，从而增强其解释能力。我们对来自癌症基因组图谱的六种癌症类型进行了评估。结果表明，与使用单模态图像数据相比，使用多模态数据的预测精度更高，平均c指数从0.6750提高到0.7283。此外，我们使用c指数和五倍交叉验证方法将我们提出的基线模型与最先进的方法进行了比较，结果显示我们的模型性能平均提高了近10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.