在一项针对胶质母细胞瘤患者的多中心研究中，强度标准化和 ComBat 批次大小对临床放射学预后模型性能的影响。

IF 4.7 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

European Radiology Pub Date : 2025-06-01 Epub Date: 2024-11-28 DOI:10.1007/s00330-024-11168-7

Kavi Fatania, Russell Frood, Hitesh Mistry, Susan C Short, James O'Connor, Andrew F Scarsbrook, Stuart Currie

{"title":"在一项针对胶质母细胞瘤患者的多中心研究中，强度标准化和 ComBat 批次大小对临床放射学预后模型性能的影响。","authors":"Kavi Fatania, Russell Frood, Hitesh Mistry, Susan C Short, James O'Connor, Andrew F Scarsbrook, Stuart Currie","doi":"10.1007/s00330-024-11168-7","DOIUrl":null,"url":null,"abstract":"Purpose: To assess the effect of different intensity standardisation techniques (ISTs) and ComBat batch sizes on radiomics survival model performance and stability in a heterogenous, multi-centre cohort of patients with glioblastoma (GBM).Methods: Multi-centre pre-operative MRI acquired between 2014 and 2020 in patients with IDH-wildtype unifocal WHO grade 4 GBM were retrospectively evaluated. WhiteStripe (WS), Nyul histogram matching (HM), and Z-score (ZS) ISTs were applied before radiomic feature (RF) extraction. RFs were realigned using ComBat and minimum batch size (MBS) of 5, 10, or 15 patients. Cox proportional hazards models for overall survival (OS) prediction were produced using five different selection strategies and the impact of IST and MBS was evaluated using bootstrapping. Calibration, discrimination, relative explained variation, and model fit were assessed. Instability was evaluated using 95% confidence intervals (95% CIs), feature selection frequency and calibration curves across the bootstrap resamples.Results: One hundred ninety-five patients were included. Median OS = 13 (95% CI: 12-14) months. Twelve to fourteen unique MRI protocols were used per MRI sequence. HM and WS produced the highest relative increase in model discrimination, explained variation and model fit but IST choice did not greatly impact on stability, nor calibration. Larger ComBat batches improved discrimination, model fit, and explained variation but higher MBS (reduced sample size) reduced stability (across all performance metrics) and reduced calibration accuracy.Conclusion: Heterogenous, real-world GBM data poses a challenge to the reproducibility of radiomics. ComBat generally improved model performance as MBS increased but reduced stability and calibration. HM and WS tended to improve model performance.Key points: Question ComBat harmonisation of RFs and intensity standardisation of MRI have not been thoroughly evaluated in multicentre, heterogeneous GBM data. Findings The addition of ComBat and ISTs can improve discrimination, relative model fit, and explained variance but degrades the calibration and stability of survival models. Clinical relevance Radiomics risk prediction models in real-world, multicentre contexts could be improved by ComBat and ISTs, however, this degrades calibration and prediction stability and this must be thoroughly investigated before patients can be accurately separated into different risk groups.","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":"3354-3366"},"PeriodicalIF":4.7000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12081554/pdf/","citationCount":"0","resultStr":"{\"title\":\"Impact of intensity standardisation and ComBat batch size on clinical-radiomic prognostic models performance in a multi-centre study of patients with glioblastoma.\",\"authors\":\"Kavi Fatania, Russell Frood, Hitesh Mistry, Susan C Short, James O'Connor, Andrew F Scarsbrook, Stuart Currie\",\"doi\":\"10.1007/s00330-024-11168-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: To assess the effect of different intensity standardisation techniques (ISTs) and ComBat batch sizes on radiomics survival model performance and stability in a heterogenous, multi-centre cohort of patients with glioblastoma (GBM).Methods: Multi-centre pre-operative MRI acquired between 2014 and 2020 in patients with IDH-wildtype unifocal WHO grade 4 GBM were retrospectively evaluated. WhiteStripe (WS), Nyul histogram matching (HM), and Z-score (ZS) ISTs were applied before radiomic feature (RF) extraction. RFs were realigned using ComBat and minimum batch size (MBS) of 5, 10, or 15 patients. Cox proportional hazards models for overall survival (OS) prediction were produced using five different selection strategies and the impact of IST and MBS was evaluated using bootstrapping. Calibration, discrimination, relative explained variation, and model fit were assessed. Instability was evaluated using 95% confidence intervals (95% CIs), feature selection frequency and calibration curves across the bootstrap resamples.Results: One hundred ninety-five patients were included. Median OS = 13 (95% CI: 12-14) months. Twelve to fourteen unique MRI protocols were used per MRI sequence. HM and WS produced the highest relative increase in model discrimination, explained variation and model fit but IST choice did not greatly impact on stability, nor calibration. Larger ComBat batches improved discrimination, model fit, and explained variation but higher MBS (reduced sample size) reduced stability (across all performance metrics) and reduced calibration accuracy.Conclusion: Heterogenous, real-world GBM data poses a challenge to the reproducibility of radiomics. ComBat generally improved model performance as MBS increased but reduced stability and calibration. HM and WS tended to improve model performance.Key points: Question ComBat harmonisation of RFs and intensity standardisation of MRI have not been thoroughly evaluated in multicentre, heterogeneous GBM data. Findings The addition of ComBat and ISTs can improve discrimination, relative model fit, and explained variance but degrades the calibration and stability of survival models. Clinical relevance Radiomics risk prediction models in real-world, multicentre contexts could be improved by ComBat and ISTs, however, this degrades calibration and prediction stability and this must be thoroughly investigated before patients can be accurately separated into different risk groups.\",\"PeriodicalId\":12076,\"journal\":{\"name\":\"European Radiology\",\"volume\":\" \",\"pages\":\"3354-3366\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12081554/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00330-024-11168-7\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/28 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11168-7","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/28 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的：评估不同的强度标准化技术（IST）和ComBat批量对异质、多中心胶质母细胞瘤（GBM）患者队列中放射组学生存模型的性能和稳定性的影响：对2014年至2020年间获得的多中心术前磁共振成像进行回顾性评估，评估对象为IDH-野生型单灶WHO 4级GBM患者。在提取放射学特征（RF）之前，应用了WhiteStripe（WS）、Nyul直方图匹配（HM）和Z-score（ZS）IST。使用 ComBat 和 5、10 或 15 例患者的最小批量 (MBS) 对 RF 进行重新对齐。使用五种不同的选择策略建立了预测总生存期（OS）的 Cox 比例危险模型，并使用引导法评估了 IST 和 MBS 的影响。对校准、区分度、相对解释变异和模型拟合度进行了评估。使用95%置信区间（95% CI）、特征选择频率和自引导重采样的校准曲线对不稳定性进行评估：共纳入 195 名患者。中位OS=13（95% CI：12-14）个月。每个 MRI 序列使用了 12 到 14 个独特的 MRI 方案。HM和WS在模型判别、解释变异和模型拟合方面的相对增幅最大，但IST的选择对稳定性和校准的影响不大。更大的ComBat批次提高了辨识度、模型拟合度和变异解释度，但更高的MBS（样本量减少）降低了稳定性（所有性能指标）并降低了校准精度：结论：异质性、真实世界的 GBM 数据对放射组学的可重复性提出了挑战。随着 MBS 的增加，ComBat 普遍提高了模型性能，但降低了稳定性和定标精度。HM和WS倾向于提高模型性能：问题 ComBat 统一射频和核磁共振成像的强度标准化尚未在多中心、异质 GBM 数据中得到全面评估。研究结果加入 ComBat 和 IST 可提高辨别率、相对模型拟合度和解释方差，但会降低生存模型的校准和稳定性。临床相关性 ComBat 和 IST 可改善真实世界多中心背景下的放射组学风险预测模型，但会降低校准和预测的稳定性，在将患者准确分为不同的风险组别之前，必须对此进行深入研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Impact of intensity standardisation and ComBat batch size on clinical-radiomic prognostic models performance in a multi-centre study of patients with glioblastoma.

Purpose: To assess the effect of different intensity standardisation techniques (ISTs) and ComBat batch sizes on radiomics survival model performance and stability in a heterogenous, multi-centre cohort of patients with glioblastoma (GBM).

Methods: Multi-centre pre-operative MRI acquired between 2014 and 2020 in patients with IDH-wildtype unifocal WHO grade 4 GBM were retrospectively evaluated. WhiteStripe (WS), Nyul histogram matching (HM), and Z-score (ZS) ISTs were applied before radiomic feature (RF) extraction. RFs were realigned using ComBat and minimum batch size (MBS) of 5, 10, or 15 patients. Cox proportional hazards models for overall survival (OS) prediction were produced using five different selection strategies and the impact of IST and MBS was evaluated using bootstrapping. Calibration, discrimination, relative explained variation, and model fit were assessed. Instability was evaluated using 95% confidence intervals (95% CIs), feature selection frequency and calibration curves across the bootstrap resamples.

Results: One hundred ninety-five patients were included. Median OS = 13 (95% CI: 12-14) months. Twelve to fourteen unique MRI protocols were used per MRI sequence. HM and WS produced the highest relative increase in model discrimination, explained variation and model fit but IST choice did not greatly impact on stability, nor calibration. Larger ComBat batches improved discrimination, model fit, and explained variation but higher MBS (reduced sample size) reduced stability (across all performance metrics) and reduced calibration accuracy.

Conclusion: Heterogenous, real-world GBM data poses a challenge to the reproducibility of radiomics. ComBat generally improved model performance as MBS increased but reduced stability and calibration. HM and WS tended to improve model performance.

Key points: Question ComBat harmonisation of RFs and intensity standardisation of MRI have not been thoroughly evaluated in multicentre, heterogeneous GBM data. Findings The addition of ComBat and ISTs can improve discrimination, relative model fit, and explained variance but degrades the calibration and stability of survival models. Clinical relevance Radiomics risk prediction models in real-world, multicentre contexts could be improved by ComBat and ISTs, however, this degrades calibration and prediction stability and this must be thoroughly investigated before patients can be accurately separated into different risk groups.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Radiology 医学-核医学

CiteScore

11.60

自引率

8.50%

发文量

874

审稿时长

2-4 weeks

期刊介绍： European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field. This is the Journal of the European Society of Radiology, and the official journal of a number of societies. From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.