基于ct的放射基因组学模型检测非小细胞肺癌EGFR突变的外部验证,以及使用合成少数过采样(SMOTE)建立模型时患病率的影响:经验教训。

IF 3.8 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Andres A Kohan, S A Mirshahvalad, R Hinzpeter, R Kulanthaivelu, L Avery, C Ortega, U Metser, A Hope, P Veit-Haibach
{"title":"基于ct的放射基因组学模型检测非小细胞肺癌EGFR突变的外部验证,以及使用合成少数过采样(SMOTE)建立模型时患病率的影响:经验教训。","authors":"Andres A Kohan, S A Mirshahvalad, R Hinzpeter, R Kulanthaivelu, L Avery, C Ortega, U Metser, A Hope, P Veit-Haibach","doi":"10.1016/j.acra.2025.04.064","DOIUrl":null,"url":null,"abstract":"<p><strong>Rationale and objectives: </strong>Radiogenomics holds promise in identifying molecular alterations in nonsmall cell lung cancer (NSCLC) using imaging features. Previously, we developed a radiogenomics model to predict epidermal growth factor receptor (EGFR) mutations based on contrast-enhanced computed tomography (CECT) in NSCLC patients. The current study aimed to externally validate this model using a publicly available National Institutes of Health (NIH)-based NSCLC dataset and assess the effect of EGFR mutation prevalence on model performance through synthetic minority oversampling technique (SMOTE).</p><p><strong>Materials and methods: </strong>The original radiogenomics model was validated on an independent NIH cohort (n=140). For assessing the influence of disease prevalence, six SMOTE-augmented datasets were created, simulating EGFR mutation prevalence from 25% to 50%. Seven models were developed (one from original data, six SMOTE-augmented), each undergoing rigorous cross-validation, feature selection, and logistic regression modeling. Models were tested against the NIH cohort. Performance was compared using area under the receiver operating characteristic curve (Area Under the Curve [AUC]), and differences between radiomic-only, clinical-only, and combined models were statistically assessed.</p><p><strong>Results: </strong>External validation revealed poor diagnostic performance for both our model and a previously published EGFR radiomics model (AUC ∼0.5). The clinical model alone achieved higher diagnostic accuracy (AUC 0.74). SMOTE-augmented models showed increased sensitivity but did not improve overall AUC compared to the clinical-only model. Changing EGFR mutation prevalence had minimal impact on AUC, challenging previous assumptions about the influence of sample imbalance on model performance.</p><p><strong>Conclusion: </strong>External validation failed to reproduce prior radiogenomics model performance, while clinical variables alone retained strong predictive value. SMOTE-based oversampling did not improve diagnostic accuracy, suggesting that, in EGFR prediction, radiomics may offer limited value beyond clinical data. Emphasis on robust external validation and data-sharing is essential for future clinical implementation of radiogenomic models.</p>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"External Validation of a CT-Based Radiogenomics Model for the Detection of EGFR Mutation in NSCLC and the Impact of Prevalence in Model Building by Using Synthetic Minority Over Sampling (SMOTE): Lessons Learned.\",\"authors\":\"Andres A Kohan, S A Mirshahvalad, R Hinzpeter, R Kulanthaivelu, L Avery, C Ortega, U Metser, A Hope, P Veit-Haibach\",\"doi\":\"10.1016/j.acra.2025.04.064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Rationale and objectives: </strong>Radiogenomics holds promise in identifying molecular alterations in nonsmall cell lung cancer (NSCLC) using imaging features. Previously, we developed a radiogenomics model to predict epidermal growth factor receptor (EGFR) mutations based on contrast-enhanced computed tomography (CECT) in NSCLC patients. The current study aimed to externally validate this model using a publicly available National Institutes of Health (NIH)-based NSCLC dataset and assess the effect of EGFR mutation prevalence on model performance through synthetic minority oversampling technique (SMOTE).</p><p><strong>Materials and methods: </strong>The original radiogenomics model was validated on an independent NIH cohort (n=140). For assessing the influence of disease prevalence, six SMOTE-augmented datasets were created, simulating EGFR mutation prevalence from 25% to 50%. Seven models were developed (one from original data, six SMOTE-augmented), each undergoing rigorous cross-validation, feature selection, and logistic regression modeling. Models were tested against the NIH cohort. Performance was compared using area under the receiver operating characteristic curve (Area Under the Curve [AUC]), and differences between radiomic-only, clinical-only, and combined models were statistically assessed.</p><p><strong>Results: </strong>External validation revealed poor diagnostic performance for both our model and a previously published EGFR radiomics model (AUC ∼0.5). The clinical model alone achieved higher diagnostic accuracy (AUC 0.74). SMOTE-augmented models showed increased sensitivity but did not improve overall AUC compared to the clinical-only model. Changing EGFR mutation prevalence had minimal impact on AUC, challenging previous assumptions about the influence of sample imbalance on model performance.</p><p><strong>Conclusion: </strong>External validation failed to reproduce prior radiogenomics model performance, while clinical variables alone retained strong predictive value. SMOTE-based oversampling did not improve diagnostic accuracy, suggesting that, in EGFR prediction, radiomics may offer limited value beyond clinical data. Emphasis on robust external validation and data-sharing is essential for future clinical implementation of radiogenomic models.</p>\",\"PeriodicalId\":50928,\"journal\":{\"name\":\"Academic Radiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Academic Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.acra.2025.04.064\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.acra.2025.04.064","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

原理和目的:放射基因组学有望利用成像特征识别非小细胞肺癌(NSCLC)的分子改变。此前,我们开发了一种放射基因组学模型,基于对比增强计算机断层扫描(CECT)预测非小细胞肺癌患者的表皮生长因子受体(EGFR)突变。目前的研究旨在使用基于美国国立卫生研究院(NIH)的公开NSCLC数据集对该模型进行外部验证,并通过合成少数过采样技术(SMOTE)评估EGFR突变流行率对模型性能的影响。材料和方法:原始放射基因组学模型在一个独立的NIH队列(n=140)中得到验证。为了评估疾病患病率的影响,创建了6个smote增强数据集,模拟EGFR突变患病率从25%到50%。开发了七个模型(一个来自原始数据,六个smote增强),每个模型都经过严格的交叉验证、特征选择和逻辑回归建模。模型在NIH队列中进行了测试。使用受试者工作特征曲线下的面积(area under the curve [AUC])对性能进行比较,并对放射组学模型、临床模型和联合模型之间的差异进行统计评估。结果:外部验证显示,我们的模型和先前发表的EGFR放射组学模型的诊断性能都很差(AUC ~ 0.5)。单独使用临床模型诊断准确率更高(AUC为0.74)。与临床模型相比,smote增强模型显示敏感性增加,但没有改善总体AUC。改变EGFR突变发生率对AUC的影响最小,挑战了之前关于样本不平衡对模型性能影响的假设。结论:外部验证无法再现先前放射基因组学模型的性能,而单独的临床变量保留了很强的预测价值。基于smote的过采样并没有提高诊断的准确性,这表明放射组学在EGFR预测中除了临床数据之外可能提供有限的价值。强调强大的外部验证和数据共享对于放射基因组模型的未来临床实施至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
External Validation of a CT-Based Radiogenomics Model for the Detection of EGFR Mutation in NSCLC and the Impact of Prevalence in Model Building by Using Synthetic Minority Over Sampling (SMOTE): Lessons Learned.

Rationale and objectives: Radiogenomics holds promise in identifying molecular alterations in nonsmall cell lung cancer (NSCLC) using imaging features. Previously, we developed a radiogenomics model to predict epidermal growth factor receptor (EGFR) mutations based on contrast-enhanced computed tomography (CECT) in NSCLC patients. The current study aimed to externally validate this model using a publicly available National Institutes of Health (NIH)-based NSCLC dataset and assess the effect of EGFR mutation prevalence on model performance through synthetic minority oversampling technique (SMOTE).

Materials and methods: The original radiogenomics model was validated on an independent NIH cohort (n=140). For assessing the influence of disease prevalence, six SMOTE-augmented datasets were created, simulating EGFR mutation prevalence from 25% to 50%. Seven models were developed (one from original data, six SMOTE-augmented), each undergoing rigorous cross-validation, feature selection, and logistic regression modeling. Models were tested against the NIH cohort. Performance was compared using area under the receiver operating characteristic curve (Area Under the Curve [AUC]), and differences between radiomic-only, clinical-only, and combined models were statistically assessed.

Results: External validation revealed poor diagnostic performance for both our model and a previously published EGFR radiomics model (AUC ∼0.5). The clinical model alone achieved higher diagnostic accuracy (AUC 0.74). SMOTE-augmented models showed increased sensitivity but did not improve overall AUC compared to the clinical-only model. Changing EGFR mutation prevalence had minimal impact on AUC, challenging previous assumptions about the influence of sample imbalance on model performance.

Conclusion: External validation failed to reproduce prior radiogenomics model performance, while clinical variables alone retained strong predictive value. SMOTE-based oversampling did not improve diagnostic accuracy, suggesting that, in EGFR prediction, radiomics may offer limited value beyond clinical data. Emphasis on robust external validation and data-sharing is essential for future clinical implementation of radiogenomic models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Academic Radiology
Academic Radiology 医学-核医学
CiteScore
7.60
自引率
10.40%
发文量
432
审稿时长
18 days
期刊介绍: Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信