Jiaqi Cui;Yuanyuan Xu;Hanci Zheng;Xi Wu;Jiliu Zhou;Yuanjun Liu;Yan Wang
{"title":"HMT: A Hybrid Multimodal Transformer With Multitask Learning for Survival Prediction in Head and Neck Cancer","authors":"Jiaqi Cui;Yuanyuan Xu;Hanci Zheng;Xi Wu;Jiliu Zhou;Yuanjun Liu;Yan Wang","doi":"10.1109/TRPMS.2025.3539739","DOIUrl":null,"url":null,"abstract":"Survival prediction is crucial for cancer patients as it offers prognostic information for treatment planning. Recently, deep learning-based multimodal survival prediction models have demonstrated promising performance. However, current models face challenges in effectively utilizing heterogeneous multimodal data (e.g., positron emission tomography (PET)/computed tomography (CT) images and clinical tabular) and extracting essential information from tumor regions, resulting in suboptimal survival prediction accuracy. To tackle these limitations, in this article, we propose a novel hybrid multimodal transformer model (HMT), namely HMT, for survival prediction from PET/CT images and clinical tabular in Head and Neck (H&N) cancer. Specifically, we develop hybrid attention modules to capture intramodal information and intermodal correlations from multimodal PET/CT images. Moreover, we design hierarchical Tabular Affine transformation modules (TATMs) to integrate supplementary insights from the heterogenous tabular with images via affine transformations. The TATM dynamically emphasizes features contributing to the survival prediction while suppressing irrelevant ones during integration. To achieve finer feature fusion, TATMs are hierarchically embedded into the network, allowing for consistent interaction between tabular and multimodal image features across multiple scales. To mitigate interferences caused by irrelevant information, we introduce tumor segmentation as an auxiliary task to capture features related to tumor regions, thus enhancing prediction accuracy. Experiments demonstrate our superior performance. The code is available at <uri>https://github.com/gluucose/HMT</uri>.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":"9 7","pages":"879-889"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10883045/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Survival prediction is crucial for cancer patients as it offers prognostic information for treatment planning. Recently, deep learning-based multimodal survival prediction models have demonstrated promising performance. However, current models face challenges in effectively utilizing heterogeneous multimodal data (e.g., positron emission tomography (PET)/computed tomography (CT) images and clinical tabular) and extracting essential information from tumor regions, resulting in suboptimal survival prediction accuracy. To tackle these limitations, in this article, we propose a novel hybrid multimodal transformer model (HMT), namely HMT, for survival prediction from PET/CT images and clinical tabular in Head and Neck (H&N) cancer. Specifically, we develop hybrid attention modules to capture intramodal information and intermodal correlations from multimodal PET/CT images. Moreover, we design hierarchical Tabular Affine transformation modules (TATMs) to integrate supplementary insights from the heterogenous tabular with images via affine transformations. The TATM dynamically emphasizes features contributing to the survival prediction while suppressing irrelevant ones during integration. To achieve finer feature fusion, TATMs are hierarchically embedded into the network, allowing for consistent interaction between tabular and multimodal image features across multiple scales. To mitigate interferences caused by irrelevant information, we introduce tumor segmentation as an auxiliary task to capture features related to tumor regions, thus enhancing prediction accuracy. Experiments demonstrate our superior performance. The code is available at https://github.com/gluucose/HMT.