Novel pre-spatial data fusion deep learning approach for multimodal volumetric outcome prediction models in radiotherapy

IF 3.2 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Medical physics Pub Date : 2025-02-10 DOI:10.1002/mp.17672

John C. Asbach, Anurag K. Singh, Austin J. Iovoli, Mark Farrugia, Anh H. Le

{"title":"Novel pre-spatial data fusion deep learning approach for multimodal volumetric outcome prediction models in radiotherapy","authors":"John C. Asbach, Anurag K. Singh, Austin J. Iovoli, Mark Farrugia, Anh H. Le","doi":"10.1002/mp.17672","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Given the recent increased emphasis on multimodal neural networks to solve complex modeling tasks, the problem of outcome prediction for a course of treatment can be framed as fundamentally multimodal in nature. A patient's response to treatment will vary based on their specific anatomy and the proposed treatment plan—these factors are spatial and closely related. However, additional factors may also have importance, such as non-spatial descriptive clinical characteristics, which can be structured as tabular data. It is critical to provide models with as comprehensive of a patient representation as possible, but inputs with differing data structures are incompatible in raw form; traditional models that consider these inputs require feature engineering prior to modeling. In neural networks, feature engineering can be organically integrated into the model itself, under one governing optimization, rather than performed prescriptively beforehand. However, the native incompatibility of different data structures must be addressed. Methods to reconcile structural incompatibilities in multimodal model inputs are called data fusion. We present a novel joint early pre-spatial (JEPS) fusion technique and demonstrate that differences in fusion approach can produce significant model performance differences even when the data is identical.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>To present a novel pre-spatial fusion technique for volumetric neural networks and demonstrate its impact on model performance for pretreatment prediction of overall survival (OS).</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>From a retrospective cohort of 531 head and neck patients treated at our clinic, we prepared an OS dataset of 222 data-complete cases at a 2-year post-treatment time threshold. Each patient's data included CT imaging, dose array, approved structure set, and a tabular summary of the patient's demographics and survey data. To establish single-modality baselines, we fit both a Cox Proportional Hazards model (CPH) and a dense neural network on only the tabular data, then we trained a 3D convolutional neural network (CNN) on only the volume data. Then, we trained five competing architectures for fusion of both modalities: two early fusion models, a late fusion model, a traditional joint fusion model, and the novel JEPS, where clinical data is merged into training upstream of most convolution operations. We used standardized 10-fold cross validation to directly compare the performance of all models on identical train/test splits of patients, using area under the receiver-operator curve (AUC) as the primary performance metric. We used a two-tailed Student <i>t</i>-test to assess the statistical significance (<i>p</i>-value threshold 0.05) of any observed performance differences.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The JEPS design scored the highest, achieving a mean AUC of 0.779 ± 0.080. The late fusion model and clinical-only CPH model scored second and third highest with 0.746 ± 0.066 and 0.720 ± 0.091 mean AUC, respectively. The performance differences between these three models were not statistically significant. All other comparison models scored significantly worse than the top performing JEPS model.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>For our OS evaluation, our JEPS fusion architecture achieves better integration of inputs and significantly improves predictive performance over most common multimodal approaches. The JEPS fusion technique is easily applied to any volumetric CNN.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 4","pages":"2675-2687"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mp.17672","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17672","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Given the recent increased emphasis on multimodal neural networks to solve complex modeling tasks, the problem of outcome prediction for a course of treatment can be framed as fundamentally multimodal in nature. A patient's response to treatment will vary based on their specific anatomy and the proposed treatment plan—these factors are spatial and closely related. However, additional factors may also have importance, such as non-spatial descriptive clinical characteristics, which can be structured as tabular data. It is critical to provide models with as comprehensive of a patient representation as possible, but inputs with differing data structures are incompatible in raw form; traditional models that consider these inputs require feature engineering prior to modeling. In neural networks, feature engineering can be organically integrated into the model itself, under one governing optimization, rather than performed prescriptively beforehand. However, the native incompatibility of different data structures must be addressed. Methods to reconcile structural incompatibilities in multimodal model inputs are called data fusion. We present a novel joint early pre-spatial (JEPS) fusion technique and demonstrate that differences in fusion approach can produce significant model performance differences even when the data is identical.

Purpose

To present a novel pre-spatial fusion technique for volumetric neural networks and demonstrate its impact on model performance for pretreatment prediction of overall survival (OS).

Methods

From a retrospective cohort of 531 head and neck patients treated at our clinic, we prepared an OS dataset of 222 data-complete cases at a 2-year post-treatment time threshold. Each patient's data included CT imaging, dose array, approved structure set, and a tabular summary of the patient's demographics and survey data. To establish single-modality baselines, we fit both a Cox Proportional Hazards model (CPH) and a dense neural network on only the tabular data, then we trained a 3D convolutional neural network (CNN) on only the volume data. Then, we trained five competing architectures for fusion of both modalities: two early fusion models, a late fusion model, a traditional joint fusion model, and the novel JEPS, where clinical data is merged into training upstream of most convolution operations. We used standardized 10-fold cross validation to directly compare the performance of all models on identical train/test splits of patients, using area under the receiver-operator curve (AUC) as the primary performance metric. We used a two-tailed Student t-test to assess the statistical significance (p-value threshold 0.05) of any observed performance differences.

Results

The JEPS design scored the highest, achieving a mean AUC of 0.779 ± 0.080. The late fusion model and clinical-only CPH model scored second and third highest with 0.746 ± 0.066 and 0.720 ± 0.091 mean AUC, respectively. The performance differences between these three models were not statistically significant. All other comparison models scored significantly worse than the top performing JEPS model.

Conclusion

For our OS evaluation, our JEPS fusion architecture achieves better integration of inputs and significantly improves predictive performance over most common multimodal approaches. The JEPS fusion technique is easily applied to any volumetric CNN.

Abstract Image

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical physics 医学-核医学

CiteScore

6.80

自引率

15.80%

发文量

660

审稿时长

1.7 months

期刊介绍： Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.