Maria C Ferrández, Sandeep S V Golla, Jakoba J Eertink, Sanne E Wiegers, Gerben J C Zwezerijnen, Martijn W Heymans, Pieternella J Lugtenburg, Lars Kurch, Andreas Hüttmann, Christine Hanoun, Ulrich Dührsen, Sally F Barrington, N George Mikhaeel, Luca Ceriani, Emanuele Zucca, Sándor Czibor, Tamás Györke, Martine E D Chamuleau, Josée M Zijlstra, Ronald Boellaard
{"title":"Validation of an Artificial Intelligence-Based Prediction Model Using 5 External PET/CT Datasets of Diffuse Large B-Cell Lymphoma.","authors":"Maria C Ferrández, Sandeep S V Golla, Jakoba J Eertink, Sanne E Wiegers, Gerben J C Zwezerijnen, Martijn W Heymans, Pieternella J Lugtenburg, Lars Kurch, Andreas Hüttmann, Christine Hanoun, Ulrich Dührsen, Sally F Barrington, N George Mikhaeel, Luca Ceriani, Emanuele Zucca, Sándor Czibor, Tamás Györke, Martine E D Chamuleau, Josée M Zijlstra, Ronald Boellaard","doi":"10.2967/jnumed.124.268191","DOIUrl":null,"url":null,"abstract":"<p><p>The aim of this study was to validate a previously developed deep learning model in 5 independent clinical trials. The predictive performance of this model was compared with the international prognostic index (IPI) and 2 models incorporating radiomic PET/CT features (clinical PET and PET models). <b>Methods:</b> In total, 1,132 diffuse large B-cell lymphoma patients were included: 296 for training and 836 for external validation. The primary outcome was 2-y time to progression. The deep learning model was trained on maximum-intensity projections from PET/CT scans. The clinical PET model included metabolic tumor volume, maximum distance from the bulkiest lesion to another lesion, SUV<sub>peak</sub>, age, and performance status. The PET model included metabolic tumor volume, maximum distance from the bulkiest lesion to another lesion, and SUV<sub>peak</sub> Model performance was assessed using the area under the curve (AUC) and Kaplan-Meier curves. <b>Results:</b> The IPI yielded an AUC of 0.60 on all external data. The deep learning model yielded a significantly higher AUC of 0.66 (<i>P</i> < 0.01). For each individual clinical trial, the model was consistently better than IPI. Radiomic model AUCs remained higher for all clinical trials. The deep learning and clinical PET models showed equivalent performance (AUC, 0.69; <i>P</i> > 0.05). The PET model yielded the highest AUC of all models (AUC, 0.71; <i>P</i> < 0.05). <b>Conclusion:</b> The deep learning model predicted outcome in all trials with a higher performance than IPI and better survival curve separation. This model can predict treatment outcome in diffuse large B-cell lymphoma without tumor delineation but at the cost of a lower prognostic performance than with radiomics.</p>","PeriodicalId":94099,"journal":{"name":"Journal of nuclear medicine : official publication, Society of Nuclear Medicine","volume":" ","pages":"1802-1807"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of nuclear medicine : official publication, Society of Nuclear Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2967/jnumed.124.268191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The aim of this study was to validate a previously developed deep learning model in 5 independent clinical trials. The predictive performance of this model was compared with the international prognostic index (IPI) and 2 models incorporating radiomic PET/CT features (clinical PET and PET models). Methods: In total, 1,132 diffuse large B-cell lymphoma patients were included: 296 for training and 836 for external validation. The primary outcome was 2-y time to progression. The deep learning model was trained on maximum-intensity projections from PET/CT scans. The clinical PET model included metabolic tumor volume, maximum distance from the bulkiest lesion to another lesion, SUVpeak, age, and performance status. The PET model included metabolic tumor volume, maximum distance from the bulkiest lesion to another lesion, and SUVpeak Model performance was assessed using the area under the curve (AUC) and Kaplan-Meier curves. Results: The IPI yielded an AUC of 0.60 on all external data. The deep learning model yielded a significantly higher AUC of 0.66 (P < 0.01). For each individual clinical trial, the model was consistently better than IPI. Radiomic model AUCs remained higher for all clinical trials. The deep learning and clinical PET models showed equivalent performance (AUC, 0.69; P > 0.05). The PET model yielded the highest AUC of all models (AUC, 0.71; P < 0.05). Conclusion: The deep learning model predicted outcome in all trials with a higher performance than IPI and better survival curve separation. This model can predict treatment outcome in diffuse large B-cell lymphoma without tumor delineation but at the cost of a lower prognostic performance than with radiomics.