Rui Yan,Xueyuan Zhang,Zihang Jiang,Baizhi Wang,Xiuwu Bian,Fei Ren,S Kevin Zhou
{"title":"Pathway-aware multimodal transformer (PAMT): Integrating pathological image and gene expression for interpretable cancer survival analysis.","authors":"Rui Yan,Xueyuan Zhang,Zihang Jiang,Baizhi Wang,Xiuwu Bian,Fei Ren,S Kevin Zhou","doi":"10.1109/tpami.2025.3611531","DOIUrl":null,"url":null,"abstract":"Integrating multimodal data of pathological image and gene expression for cancer survival analysis can achieve better results than using a single modality. However, existing multimodal learning methods ignore fine-grained interactions between both modalities, especially the interactions between biological pathways and pathological image patches. In this article, we propose a novel Pathway-Aware Multimodal Transformer (PAMT) framework for interpretable cancer survival analysis. Specifically, the PAMT learns fine-grained modality interaction through three stages: (1) In the intra-modal pathway-pathway / patch-patch interaction stage, we use the Transformer model to perform intra-modal information interaction; (2) In the inter-modal pathway-patch alignment stage, we introduce a novel label-free contrastive loss to aligns semantic information between different modalities so that the features of the two modalities are mapped to the same semantic space; and (3) In the inter-modal pathway-patch fusion stage, to model the medical prior knowledge of \"genotype determines phenotype\", we propose a pathway-to-patch cross fusion module to perform inter-modal information interaction under the guidance of pathway prior. In addition, the inter-modal cross fusion module of PAMT endows good interpretability, helping a pathologist to screen which pathway plays a key role, to locate where on whole slide image (WSI) are affected by the pathway, and to mine prognosis-relevant pathology image patterns. Experimental results based on three datasets of bladder urothelial carcinoma, lung squamous cell carcinoma, and lung adenocarcinoma demonstrate that the proposed framework significantly outperforms the state-of-the-art methods. Finally, based on the PAMT model, we develop a website that directly visualizes the impact of 186 pathways on all areas of WSI, available at http://222.128.10.254:18822/#/.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"78 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3611531","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Integrating multimodal data of pathological image and gene expression for cancer survival analysis can achieve better results than using a single modality. However, existing multimodal learning methods ignore fine-grained interactions between both modalities, especially the interactions between biological pathways and pathological image patches. In this article, we propose a novel Pathway-Aware Multimodal Transformer (PAMT) framework for interpretable cancer survival analysis. Specifically, the PAMT learns fine-grained modality interaction through three stages: (1) In the intra-modal pathway-pathway / patch-patch interaction stage, we use the Transformer model to perform intra-modal information interaction; (2) In the inter-modal pathway-patch alignment stage, we introduce a novel label-free contrastive loss to aligns semantic information between different modalities so that the features of the two modalities are mapped to the same semantic space; and (3) In the inter-modal pathway-patch fusion stage, to model the medical prior knowledge of "genotype determines phenotype", we propose a pathway-to-patch cross fusion module to perform inter-modal information interaction under the guidance of pathway prior. In addition, the inter-modal cross fusion module of PAMT endows good interpretability, helping a pathologist to screen which pathway plays a key role, to locate where on whole slide image (WSI) are affected by the pathway, and to mine prognosis-relevant pathology image patterns. Experimental results based on three datasets of bladder urothelial carcinoma, lung squamous cell carcinoma, and lung adenocarcinoma demonstrate that the proposed framework significantly outperforms the state-of-the-art methods. Finally, based on the PAMT model, we develop a website that directly visualizes the impact of 186 pathways on all areas of WSI, available at http://222.128.10.254:18822/#/.
期刊介绍:
The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.