HMT: A Hybrid Multimodal Transformer With Multitask Learning for Survival Prediction in Head and Neck Cancer

IF 3.5 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

IEEE Transactions on Radiation and Plasma Medical Sciences Pub Date : 2025-02-12 DOI:10.1109/TRPMS.2025.3539739

Jiaqi Cui;Yuanyuan Xu;Hanci Zheng;Xi Wu;Jiliu Zhou;Yuanjun Liu;Yan Wang

{"title":"HMT: A Hybrid Multimodal Transformer With Multitask Learning for Survival Prediction in Head and Neck Cancer","authors":"Jiaqi Cui;Yuanyuan Xu;Hanci Zheng;Xi Wu;Jiliu Zhou;Yuanjun Liu;Yan Wang","doi":"10.1109/TRPMS.2025.3539739","DOIUrl":null,"url":null,"abstract":"Survival prediction is crucial for cancer patients as it offers prognostic information for treatment planning. Recently, deep learning-based multimodal survival prediction models have demonstrated promising performance. However, current models face challenges in effectively utilizing heterogeneous multimodal data (e.g., positron emission tomography (PET)/computed tomography (CT) images and clinical tabular) and extracting essential information from tumor regions, resulting in suboptimal survival prediction accuracy. To tackle these limitations, in this article, we propose a novel hybrid multimodal transformer model (HMT), namely HMT, for survival prediction from PET/CT images and clinical tabular in Head and Neck (H&N) cancer. Specifically, we develop hybrid attention modules to capture intramodal information and intermodal correlations from multimodal PET/CT images. Moreover, we design hierarchical Tabular Affine transformation modules (TATMs) to integrate supplementary insights from the heterogenous tabular with images via affine transformations. The TATM dynamically emphasizes features contributing to the survival prediction while suppressing irrelevant ones during integration. To achieve finer feature fusion, TATMs are hierarchically embedded into the network, allowing for consistent interaction between tabular and multimodal image features across multiple scales. To mitigate interferences caused by irrelevant information, we introduce tumor segmentation as an auxiliary task to capture features related to tumor regions, thus enhancing prediction accuracy. Experiments demonstrate our superior performance. The code is available at <uri>https://github.com/gluucose/HMT</uri>.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":"9 7","pages":"879-889"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10883045/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Survival prediction is crucial for cancer patients as it offers prognostic information for treatment planning. Recently, deep learning-based multimodal survival prediction models have demonstrated promising performance. However, current models face challenges in effectively utilizing heterogeneous multimodal data (e.g., positron emission tomography (PET)/computed tomography (CT) images and clinical tabular) and extracting essential information from tumor regions, resulting in suboptimal survival prediction accuracy. To tackle these limitations, in this article, we propose a novel hybrid multimodal transformer model (HMT), namely HMT, for survival prediction from PET/CT images and clinical tabular in Head and Neck (H&N) cancer. Specifically, we develop hybrid attention modules to capture intramodal information and intermodal correlations from multimodal PET/CT images. Moreover, we design hierarchical Tabular Affine transformation modules (TATMs) to integrate supplementary insights from the heterogenous tabular with images via affine transformations. The TATM dynamically emphasizes features contributing to the survival prediction while suppressing irrelevant ones during integration. To achieve finer feature fusion, TATMs are hierarchically embedded into the network, allowing for consistent interaction between tabular and multimodal image features across multiple scales. To mitigate interferences caused by irrelevant information, we introduce tumor segmentation as an auxiliary task to capture features related to tumor regions, thus enhancing prediction accuracy. Experiments demonstrate our superior performance. The code is available at https://github.com/gluucose/HMT.

查看原文本刊更多论文

HMT：用于头颈癌生存预测的多任务学习混合多模态变压器

生存预测对癌症患者至关重要，因为它为治疗计划提供了预后信息。最近，基于深度学习的多模态生存预测模型表现出了良好的性能。然而，目前的模型在有效利用异构多模态数据(例如，正电子发射断层扫描（PET)/计算机断层扫描（CT）图像和临床表格）和从肿瘤区域提取基本信息方面面临挑战，导致生存预测精度不理想。为了解决这些限制，在本文中，我们提出了一种新的混合多模态变压器模型（HMT），即HMT，用于头颈部（H&N）癌症的PET/CT图像和临床表格的生存预测。具体来说，我们开发了混合注意力模块，以从多模态PET/CT图像中捕获模态内信息和多模态相关性。此外，我们设计了分层表仿射变换模块（tatm），通过仿射变换将异质表与图像的补充见解集成在一起。TATM动态地强调有助于生存预测的特征，同时在集成过程中抑制不相关的特征。为了实现更精细的特征融合，tatm分层嵌入到网络中，允许在多个尺度上的表格和多模态图像特征之间进行一致的交互。为了减轻不相关信息带来的干扰，我们引入肿瘤分割作为辅助任务来捕捉肿瘤区域相关特征，从而提高预测精度。实验证明了我们的优越性能。代码可在https://github.com/gluucose/HMT上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Radiation and Plasma Medical Sciences RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

8.00

自引率

18.20%

发文量

109