Andres A Kohan, S A Mirshahvalad, R Hinzpeter, R Kulanthaivelu, L Avery, C Ortega, U Metser, A Hope, P Veit-Haibach
{"title":"基于ct的放射基因组学模型检测非小细胞肺癌EGFR突变的外部验证,以及使用合成少数过采样(SMOTE)建立模型时患病率的影响:经验教训。","authors":"Andres A Kohan, S A Mirshahvalad, R Hinzpeter, R Kulanthaivelu, L Avery, C Ortega, U Metser, A Hope, P Veit-Haibach","doi":"10.1016/j.acra.2025.04.064","DOIUrl":null,"url":null,"abstract":"<p><strong>Rationale and objectives: </strong>Radiogenomics holds promise in identifying molecular alterations in nonsmall cell lung cancer (NSCLC) using imaging features. Previously, we developed a radiogenomics model to predict epidermal growth factor receptor (EGFR) mutations based on contrast-enhanced computed tomography (CECT) in NSCLC patients. The current study aimed to externally validate this model using a publicly available National Institutes of Health (NIH)-based NSCLC dataset and assess the effect of EGFR mutation prevalence on model performance through synthetic minority oversampling technique (SMOTE).</p><p><strong>Materials and methods: </strong>The original radiogenomics model was validated on an independent NIH cohort (n=140). For assessing the influence of disease prevalence, six SMOTE-augmented datasets were created, simulating EGFR mutation prevalence from 25% to 50%. Seven models were developed (one from original data, six SMOTE-augmented), each undergoing rigorous cross-validation, feature selection, and logistic regression modeling. Models were tested against the NIH cohort. Performance was compared using area under the receiver operating characteristic curve (Area Under the Curve [AUC]), and differences between radiomic-only, clinical-only, and combined models were statistically assessed.</p><p><strong>Results: </strong>External validation revealed poor diagnostic performance for both our model and a previously published EGFR radiomics model (AUC ∼0.5). The clinical model alone achieved higher diagnostic accuracy (AUC 0.74). SMOTE-augmented models showed increased sensitivity but did not improve overall AUC compared to the clinical-only model. Changing EGFR mutation prevalence had minimal impact on AUC, challenging previous assumptions about the influence of sample imbalance on model performance.</p><p><strong>Conclusion: </strong>External validation failed to reproduce prior radiogenomics model performance, while clinical variables alone retained strong predictive value. SMOTE-based oversampling did not improve diagnostic accuracy, suggesting that, in EGFR prediction, radiomics may offer limited value beyond clinical data. Emphasis on robust external validation and data-sharing is essential for future clinical implementation of radiogenomic models.</p>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"External Validation of a CT-Based Radiogenomics Model for the Detection of EGFR Mutation in NSCLC and the Impact of Prevalence in Model Building by Using Synthetic Minority Over Sampling (SMOTE): Lessons Learned.\",\"authors\":\"Andres A Kohan, S A Mirshahvalad, R Hinzpeter, R Kulanthaivelu, L Avery, C Ortega, U Metser, A Hope, P Veit-Haibach\",\"doi\":\"10.1016/j.acra.2025.04.064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Rationale and objectives: </strong>Radiogenomics holds promise in identifying molecular alterations in nonsmall cell lung cancer (NSCLC) using imaging features. Previously, we developed a radiogenomics model to predict epidermal growth factor receptor (EGFR) mutations based on contrast-enhanced computed tomography (CECT) in NSCLC patients. The current study aimed to externally validate this model using a publicly available National Institutes of Health (NIH)-based NSCLC dataset and assess the effect of EGFR mutation prevalence on model performance through synthetic minority oversampling technique (SMOTE).</p><p><strong>Materials and methods: </strong>The original radiogenomics model was validated on an independent NIH cohort (n=140). For assessing the influence of disease prevalence, six SMOTE-augmented datasets were created, simulating EGFR mutation prevalence from 25% to 50%. Seven models were developed (one from original data, six SMOTE-augmented), each undergoing rigorous cross-validation, feature selection, and logistic regression modeling. Models were tested against the NIH cohort. Performance was compared using area under the receiver operating characteristic curve (Area Under the Curve [AUC]), and differences between radiomic-only, clinical-only, and combined models were statistically assessed.</p><p><strong>Results: </strong>External validation revealed poor diagnostic performance for both our model and a previously published EGFR radiomics model (AUC ∼0.5). The clinical model alone achieved higher diagnostic accuracy (AUC 0.74). SMOTE-augmented models showed increased sensitivity but did not improve overall AUC compared to the clinical-only model. Changing EGFR mutation prevalence had minimal impact on AUC, challenging previous assumptions about the influence of sample imbalance on model performance.</p><p><strong>Conclusion: </strong>External validation failed to reproduce prior radiogenomics model performance, while clinical variables alone retained strong predictive value. SMOTE-based oversampling did not improve diagnostic accuracy, suggesting that, in EGFR prediction, radiomics may offer limited value beyond clinical data. Emphasis on robust external validation and data-sharing is essential for future clinical implementation of radiogenomic models.</p>\",\"PeriodicalId\":50928,\"journal\":{\"name\":\"Academic Radiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Academic Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.acra.2025.04.064\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.acra.2025.04.064","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
摘要
原理和目的:放射基因组学有望利用成像特征识别非小细胞肺癌(NSCLC)的分子改变。此前,我们开发了一种放射基因组学模型,基于对比增强计算机断层扫描(CECT)预测非小细胞肺癌患者的表皮生长因子受体(EGFR)突变。目前的研究旨在使用基于美国国立卫生研究院(NIH)的公开NSCLC数据集对该模型进行外部验证,并通过合成少数过采样技术(SMOTE)评估EGFR突变流行率对模型性能的影响。材料和方法:原始放射基因组学模型在一个独立的NIH队列(n=140)中得到验证。为了评估疾病患病率的影响,创建了6个smote增强数据集,模拟EGFR突变患病率从25%到50%。开发了七个模型(一个来自原始数据,六个smote增强),每个模型都经过严格的交叉验证、特征选择和逻辑回归建模。模型在NIH队列中进行了测试。使用受试者工作特征曲线下的面积(area under the curve [AUC])对性能进行比较,并对放射组学模型、临床模型和联合模型之间的差异进行统计评估。结果:外部验证显示,我们的模型和先前发表的EGFR放射组学模型的诊断性能都很差(AUC ~ 0.5)。单独使用临床模型诊断准确率更高(AUC为0.74)。与临床模型相比,smote增强模型显示敏感性增加,但没有改善总体AUC。改变EGFR突变发生率对AUC的影响最小,挑战了之前关于样本不平衡对模型性能影响的假设。结论:外部验证无法再现先前放射基因组学模型的性能,而单独的临床变量保留了很强的预测价值。基于smote的过采样并没有提高诊断的准确性,这表明放射组学在EGFR预测中除了临床数据之外可能提供有限的价值。强调强大的外部验证和数据共享对于放射基因组模型的未来临床实施至关重要。
External Validation of a CT-Based Radiogenomics Model for the Detection of EGFR Mutation in NSCLC and the Impact of Prevalence in Model Building by Using Synthetic Minority Over Sampling (SMOTE): Lessons Learned.
Rationale and objectives: Radiogenomics holds promise in identifying molecular alterations in nonsmall cell lung cancer (NSCLC) using imaging features. Previously, we developed a radiogenomics model to predict epidermal growth factor receptor (EGFR) mutations based on contrast-enhanced computed tomography (CECT) in NSCLC patients. The current study aimed to externally validate this model using a publicly available National Institutes of Health (NIH)-based NSCLC dataset and assess the effect of EGFR mutation prevalence on model performance through synthetic minority oversampling technique (SMOTE).
Materials and methods: The original radiogenomics model was validated on an independent NIH cohort (n=140). For assessing the influence of disease prevalence, six SMOTE-augmented datasets were created, simulating EGFR mutation prevalence from 25% to 50%. Seven models were developed (one from original data, six SMOTE-augmented), each undergoing rigorous cross-validation, feature selection, and logistic regression modeling. Models were tested against the NIH cohort. Performance was compared using area under the receiver operating characteristic curve (Area Under the Curve [AUC]), and differences between radiomic-only, clinical-only, and combined models were statistically assessed.
Results: External validation revealed poor diagnostic performance for both our model and a previously published EGFR radiomics model (AUC ∼0.5). The clinical model alone achieved higher diagnostic accuracy (AUC 0.74). SMOTE-augmented models showed increased sensitivity but did not improve overall AUC compared to the clinical-only model. Changing EGFR mutation prevalence had minimal impact on AUC, challenging previous assumptions about the influence of sample imbalance on model performance.
Conclusion: External validation failed to reproduce prior radiogenomics model performance, while clinical variables alone retained strong predictive value. SMOTE-based oversampling did not improve diagnostic accuracy, suggesting that, in EGFR prediction, radiomics may offer limited value beyond clinical data. Emphasis on robust external validation and data-sharing is essential for future clinical implementation of radiogenomic models.
期刊介绍:
Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.