Ronald Boellaard, Arman Rahmim, Jacoba J. Eertink, Ulrich Duehrsen, Lars Kurch, Pieternella J. Lugtenburg, Sanne E. Wiegers, Gerben J.C. Zwezerijnen, Josée M. Zijlstra, Martijn W. Heymans, Irène Buvat
{"title":"2024年SNMMI人工智能任务组放射组学挑战总结报告","authors":"Ronald Boellaard, Arman Rahmim, Jacoba J. Eertink, Ulrich Duehrsen, Lars Kurch, Pieternella J. Lugtenburg, Sanne E. Wiegers, Gerben J.C. Zwezerijnen, Josée M. Zijlstra, Martijn W. Heymans, Irène Buvat","doi":"10.2967/jnumed.124.269425","DOIUrl":null,"url":null,"abstract":"<p>In medical imaging, challenges are competitions that aim to provide a fair comparison of different methodologic solutions to a common problem. Challenges typically focus on addressing real-world problems, such as segmentation, detection, and prediction tasks, using various types of medical images and associated data. Here, we describe the organization and results of such a challenge to compare machine-learning models for predicting survival in patients with diffuse large B-cell lymphoma using a baseline <sup>18</sup>F-FDG PET/CT radiomics dataset. <b>Methods:</b> This challenge aimed to predict progression-free survival (PFS) in patients with diffuse large B-cell lymphoma, either as a binary outcome (shorter than 2 y versus longer than 2 y) or as a continuous outcome (survival in months). All participants were provided with a radiomic training dataset, including the ground truth survival for designing a predictive model and a radiomic test dataset without ground truth. Figures of merit (FOMs) used to assess model performance were the root-mean-square error for continuous outcomes and the C-index for 1-, 2-, and 3-y PFS binary outcomes. The challenge was endorsed and initiated by the Society of Nuclear Medicine and Molecular Imaging AI Task Force. <b>Results:</b> Nineteen models for predicting PFS as a continuous outcome from 15 teams were received. Among those models, external validation identified 6 models showing similar performance to that of a simple general linear reference model using SUV and total metabolic tumor volumes (TMTV) only. Twelve models for predicting binary outcomes were submitted by 9 teams. External validation showed that 1 model had higher, but nonsignificant, C-index values compared with values obtained by a simple logistic regression model using SUV and TMTV. <b>Conclusion:</b> Some of the radiomic-based machine-learning models developed by participants showed better FOMs than did simple linear or logistic regression models based on SUV and TMTV only, although the differences in observed FOMs were nonsignificant. This suggests that, for the challenge dataset, there was limited or no value seen from the addition of sophisticated radiomic features and use of machine learning when developing models for outcome prediction.</p>","PeriodicalId":22820,"journal":{"name":"The Journal of Nuclear Medicine","volume":"48 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Summary Report of the SNMMI AI Task Force Radiomics Challenge 2024\",\"authors\":\"Ronald Boellaard, Arman Rahmim, Jacoba J. Eertink, Ulrich Duehrsen, Lars Kurch, Pieternella J. Lugtenburg, Sanne E. Wiegers, Gerben J.C. Zwezerijnen, Josée M. Zijlstra, Martijn W. Heymans, Irène Buvat\",\"doi\":\"10.2967/jnumed.124.269425\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In medical imaging, challenges are competitions that aim to provide a fair comparison of different methodologic solutions to a common problem. Challenges typically focus on addressing real-world problems, such as segmentation, detection, and prediction tasks, using various types of medical images and associated data. Here, we describe the organization and results of such a challenge to compare machine-learning models for predicting survival in patients with diffuse large B-cell lymphoma using a baseline <sup>18</sup>F-FDG PET/CT radiomics dataset. <b>Methods:</b> This challenge aimed to predict progression-free survival (PFS) in patients with diffuse large B-cell lymphoma, either as a binary outcome (shorter than 2 y versus longer than 2 y) or as a continuous outcome (survival in months). All participants were provided with a radiomic training dataset, including the ground truth survival for designing a predictive model and a radiomic test dataset without ground truth. Figures of merit (FOMs) used to assess model performance were the root-mean-square error for continuous outcomes and the C-index for 1-, 2-, and 3-y PFS binary outcomes. The challenge was endorsed and initiated by the Society of Nuclear Medicine and Molecular Imaging AI Task Force. <b>Results:</b> Nineteen models for predicting PFS as a continuous outcome from 15 teams were received. Among those models, external validation identified 6 models showing similar performance to that of a simple general linear reference model using SUV and total metabolic tumor volumes (TMTV) only. Twelve models for predicting binary outcomes were submitted by 9 teams. External validation showed that 1 model had higher, but nonsignificant, C-index values compared with values obtained by a simple logistic regression model using SUV and TMTV. <b>Conclusion:</b> Some of the radiomic-based machine-learning models developed by participants showed better FOMs than did simple linear or logistic regression models based on SUV and TMTV only, although the differences in observed FOMs were nonsignificant. This suggests that, for the challenge dataset, there was limited or no value seen from the addition of sophisticated radiomic features and use of machine learning when developing models for outcome prediction.</p>\",\"PeriodicalId\":22820,\"journal\":{\"name\":\"The Journal of Nuclear Medicine\",\"volume\":\"48 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Nuclear Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2967/jnumed.124.269425\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Nuclear Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2967/jnumed.124.269425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在医学成像领域,挑战是一种竞赛,旨在对一个共同问题的不同方法解决方案进行公平比较。挑战通常集中在解决现实世界的问题,如分割、检测和预测任务,使用各种类型的医学图像和相关数据。在这里,我们描述了这样一个挑战的组织和结果,以比较机器学习模型,预测弥漫性大b细胞淋巴瘤患者的生存,使用基线18F-FDG PET/CT放射组学数据集。方法:该挑战旨在预测弥漫性大b细胞淋巴瘤患者的无进展生存期(PFS),无论是作为二元结果(短于2年vs长于2年)还是作为连续结果(以月为单位的生存期)。为所有参与者提供了一个放射学训练数据集,包括用于设计预测模型的地面真值生存和没有地面真值的放射学测试数据集。用于评估模型性能的价值图(FOMs)是连续结果的均方根误差和1、2和3 y PFS二进制结果的c指数。这项挑战是由核医学学会和分子成像人工智能工作组批准和发起的。结果:从15个团队收到19个预测PFS的连续结果模型。在这些模型中,外部验证发现6个模型仅使用SUV和总代谢肿瘤体积(TMTV)与简单的一般线性参考模型表现相似。9个团队提交了12个预测二元结果的模型。外部验证表明,与SUV和TMTV简单逻辑回归模型相比,1个模型的c -指数值更高,但不显著。结论:参与者开发的一些基于放射组学的机器学习模型比仅基于SUV和TMTV的简单线性或逻辑回归模型显示出更好的FOMs,尽管观察到的FOMs差异不显著。这表明,对于挑战数据集,在开发结果预测模型时,添加复杂的放射学特征和使用机器学习的价值有限或没有价值。
Summary Report of the SNMMI AI Task Force Radiomics Challenge 2024
In medical imaging, challenges are competitions that aim to provide a fair comparison of different methodologic solutions to a common problem. Challenges typically focus on addressing real-world problems, such as segmentation, detection, and prediction tasks, using various types of medical images and associated data. Here, we describe the organization and results of such a challenge to compare machine-learning models for predicting survival in patients with diffuse large B-cell lymphoma using a baseline 18F-FDG PET/CT radiomics dataset. Methods: This challenge aimed to predict progression-free survival (PFS) in patients with diffuse large B-cell lymphoma, either as a binary outcome (shorter than 2 y versus longer than 2 y) or as a continuous outcome (survival in months). All participants were provided with a radiomic training dataset, including the ground truth survival for designing a predictive model and a radiomic test dataset without ground truth. Figures of merit (FOMs) used to assess model performance were the root-mean-square error for continuous outcomes and the C-index for 1-, 2-, and 3-y PFS binary outcomes. The challenge was endorsed and initiated by the Society of Nuclear Medicine and Molecular Imaging AI Task Force. Results: Nineteen models for predicting PFS as a continuous outcome from 15 teams were received. Among those models, external validation identified 6 models showing similar performance to that of a simple general linear reference model using SUV and total metabolic tumor volumes (TMTV) only. Twelve models for predicting binary outcomes were submitted by 9 teams. External validation showed that 1 model had higher, but nonsignificant, C-index values compared with values obtained by a simple logistic regression model using SUV and TMTV. Conclusion: Some of the radiomic-based machine-learning models developed by participants showed better FOMs than did simple linear or logistic regression models based on SUV and TMTV only, although the differences in observed FOMs were nonsignificant. This suggests that, for the challenge dataset, there was limited or no value seen from the addition of sophisticated radiomic features and use of machine learning when developing models for outcome prediction.