Pathway-aware multimodal transformer (PAMT): Integrating pathological image and gene expression for interpretable cancer survival analysis.

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-18 DOI:10.1109/tpami.2025.3611531

Rui Yan,Xueyuan Zhang,Zihang Jiang,Baizhi Wang,Xiuwu Bian,Fei Ren,S Kevin Zhou

{"title":"Pathway-aware multimodal transformer (PAMT): Integrating pathological image and gene expression for interpretable cancer survival analysis.","authors":"Rui Yan,Xueyuan Zhang,Zihang Jiang,Baizhi Wang,Xiuwu Bian,Fei Ren,S Kevin Zhou","doi":"10.1109/tpami.2025.3611531","DOIUrl":null,"url":null,"abstract":"Integrating multimodal data of pathological image and gene expression for cancer survival analysis can achieve better results than using a single modality. However, existing multimodal learning methods ignore fine-grained interactions between both modalities, especially the interactions between biological pathways and pathological image patches. In this article, we propose a novel Pathway-Aware Multimodal Transformer (PAMT) framework for interpretable cancer survival analysis. Specifically, the PAMT learns fine-grained modality interaction through three stages: (1) In the intra-modal pathway-pathway / patch-patch interaction stage, we use the Transformer model to perform intra-modal information interaction; (2) In the inter-modal pathway-patch alignment stage, we introduce a novel label-free contrastive loss to aligns semantic information between different modalities so that the features of the two modalities are mapped to the same semantic space; and (3) In the inter-modal pathway-patch fusion stage, to model the medical prior knowledge of \"genotype determines phenotype\", we propose a pathway-to-patch cross fusion module to perform inter-modal information interaction under the guidance of pathway prior. In addition, the inter-modal cross fusion module of PAMT endows good interpretability, helping a pathologist to screen which pathway plays a key role, to locate where on whole slide image (WSI) are affected by the pathway, and to mine prognosis-relevant pathology image patterns. Experimental results based on three datasets of bladder urothelial carcinoma, lung squamous cell carcinoma, and lung adenocarcinoma demonstrate that the proposed framework significantly outperforms the state-of-the-art methods. Finally, based on the PAMT model, we develop a website that directly visualizes the impact of 186 pathways on all areas of WSI, available at http://222.128.10.254:18822/#/.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"78 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3611531","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Integrating multimodal data of pathological image and gene expression for cancer survival analysis can achieve better results than using a single modality. However, existing multimodal learning methods ignore fine-grained interactions between both modalities, especially the interactions between biological pathways and pathological image patches. In this article, we propose a novel Pathway-Aware Multimodal Transformer (PAMT) framework for interpretable cancer survival analysis. Specifically, the PAMT learns fine-grained modality interaction through three stages: (1) In the intra-modal pathway-pathway / patch-patch interaction stage, we use the Transformer model to perform intra-modal information interaction; (2) In the inter-modal pathway-patch alignment stage, we introduce a novel label-free contrastive loss to aligns semantic information between different modalities so that the features of the two modalities are mapped to the same semantic space; and (3) In the inter-modal pathway-patch fusion stage, to model the medical prior knowledge of "genotype determines phenotype", we propose a pathway-to-patch cross fusion module to perform inter-modal information interaction under the guidance of pathway prior. In addition, the inter-modal cross fusion module of PAMT endows good interpretability, helping a pathologist to screen which pathway plays a key role, to locate where on whole slide image (WSI) are affected by the pathway, and to mine prognosis-relevant pathology image patterns. Experimental results based on three datasets of bladder urothelial carcinoma, lung squamous cell carcinoma, and lung adenocarcinoma demonstrate that the proposed framework significantly outperforms the state-of-the-art methods. Finally, based on the PAMT model, we develop a website that directly visualizes the impact of 186 pathways on all areas of WSI, available at http://222.128.10.254:18822/#/.

查看原文本刊更多论文

通路感知多模态转换器（PAMT）：整合病理图像和基因表达用于可解释的癌症生存分析。

整合病理影像和基因表达的多模态数据进行癌症生存分析，比使用单一模态获得更好的结果。然而，现有的多模态学习方法忽略了两种模式之间的细粒度相互作用，特别是生物通路和病理图像斑块之间的相互作用。在本文中，我们提出了一种新的通路感知多模态变压器（PAMT）框架，用于可解释的癌症生存分析。具体而言，PAMT通过三个阶段学习细粒度的模态交互：(1)在模态内路径-路径/补丁-补丁交互阶段，我们使用Transformer模型进行模态内信息交互；(2)在模态间路径-斑块对齐阶段，我们引入了一种新的无标签对比损失来对齐不同模态之间的语义信息，使两模态的特征映射到相同的语义空间；(3)在多模态通路-贴片融合阶段，为了模拟“基因型决定表型”的医学先验知识，我们提出了通路-贴片交叉融合模块，在通路先验的指导下进行多模态信息交互。此外，PAMT的多模态交叉融合模块具有良好的可解释性，有助于病理学家筛选哪条通路起关键作用，定位整个幻灯片图像（WSI）上受该通路影响的位置，并挖掘与预后相关的病理图像模式。基于膀胱尿路上皮癌、肺鳞状细胞癌和肺腺癌三个数据集的实验结果表明，所提出的框架明显优于目前最先进的方法。最后，基于PAMT模型，我们开发了一个网站，可以在http://222.128.10.254:18822/#/上直接可视化186条路径对WSI所有地区的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.