Impact of harmonization and oversampling methods on radiomics analysis of multi-center imbalanced datasets: application to PET-based prediction of lung cancer subtypes.

IF 3.2 2区医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

EJNMMI Physics Pub Date : 2025-04-07 DOI:10.1186/s40658-025-00750-7

Dongyang Du, Isaac Shiri, Fereshteh Yousefirizi, Mohammad R Salmanpour, Jieqin Lv, Huiqin Wu, Wentao Zhu, Habib Zaidi, Lijun Lu, Arman Rahmim

{"title":"Impact of harmonization and oversampling methods on radiomics analysis of multi-center imbalanced datasets: application to PET-based prediction of lung cancer subtypes.","authors":"Dongyang Du, Isaac Shiri, Fereshteh Yousefirizi, Mohammad R Salmanpour, Jieqin Lv, Huiqin Wu, Wentao Zhu, Habib Zaidi, Lijun Lu, Arman Rahmim","doi":"10.1186/s40658-025-00750-7","DOIUrl":null,"url":null,"abstract":"Background: Medical imaging data frequently encounter image-generation heterogeneity and class imbalance properties, challenging strong generalized predictive performances with data-driven machine-learning methods. The purpose of this study was to investigate the impact of harmonization and oversampling methods on multi-center imbalanced datasets, with specific application to PET-based radiomics modeling for histologic subtype prediction in non-small cell lung cancer (NSCLC).Methods: The retrospective study included 245 patients with adenocarcinoma (ADC) and 78 patients with squamous cell carcinoma (SCC) from 4 centers. Utilizing 1502 radiomics features per patient, we trained, validated, and tested 4 machine-learning classifiers, to investigate the effect of no harmonization (NoH) or 4 feature harmonization methods, paired with no oversampling (NoO) or 5 oversampling methods on subtype prediction. Model performance was evaluated using the average area under the ROC curve (AUROC) and G-mean via 5 times 5-fold cross-validations. Statistical comparisons of the combined models against baseline (NoH + NoO) were performed for each fold of cross-validation using the DeLong test.Results: The number of cross-combinations with both AUROC and G-mean outperforming baseline in validation and testing was 15, 4, 2, and 7 (out of 29) for random forest (RF), linear discriminant analysis (LDA), logistic regression (LR), and support vector machine (SVM), respectively. ComBat harmonization combined with oversampling (SMOTE) via RF yielded better performance than baseline (AUROC and G-mean of validation: 0.725 vs. 0.608 and 0.625 vs. 0.398; testing: 0.637 vs. 0.567 and 0.506 vs. 0.287), though statistical significances were not observed.Conclusions: Applying harmonization and oversampling methods in multi-center imbalanced datasets can improve NSCLC-subtype prediction, but the effect varies widely across classifiers. We have created open-source comparisons of harmonization and oversampling on different classifiers for comprehensive evaluations in different studies.","PeriodicalId":11559,"journal":{"name":"EJNMMI Physics","volume":"12 1","pages":"34"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11977052/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EJNMMI Physics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40658-025-00750-7","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Medical imaging data frequently encounter image-generation heterogeneity and class imbalance properties, challenging strong generalized predictive performances with data-driven machine-learning methods. The purpose of this study was to investigate the impact of harmonization and oversampling methods on multi-center imbalanced datasets, with specific application to PET-based radiomics modeling for histologic subtype prediction in non-small cell lung cancer (NSCLC).

Methods: The retrospective study included 245 patients with adenocarcinoma (ADC) and 78 patients with squamous cell carcinoma (SCC) from 4 centers. Utilizing 1502 radiomics features per patient, we trained, validated, and tested 4 machine-learning classifiers, to investigate the effect of no harmonization (NoH) or 4 feature harmonization methods, paired with no oversampling (NoO) or 5 oversampling methods on subtype prediction. Model performance was evaluated using the average area under the ROC curve (AUROC) and G-mean via 5 times 5-fold cross-validations. Statistical comparisons of the combined models against baseline (NoH + NoO) were performed for each fold of cross-validation using the DeLong test.

Results: The number of cross-combinations with both AUROC and G-mean outperforming baseline in validation and testing was 15, 4, 2, and 7 (out of 29) for random forest (RF), linear discriminant analysis (LDA), logistic regression (LR), and support vector machine (SVM), respectively. ComBat harmonization combined with oversampling (SMOTE) via RF yielded better performance than baseline (AUROC and G-mean of validation: 0.725 vs. 0.608 and 0.625 vs. 0.398; testing: 0.637 vs. 0.567 and 0.506 vs. 0.287), though statistical significances were not observed.

Conclusions: Applying harmonization and oversampling methods in multi-center imbalanced datasets can improve NSCLC-subtype prediction, but the effect varies widely across classifiers. We have created open-source comparisons of harmonization and oversampling on different classifiers for comprehensive evaluations in different studies.

查看原文本刊更多论文

协调和过采样方法对多中心不平衡数据集放射组学分析的影响：应用于基于pet的肺癌亚型预测。

背景：医学影像数据经常会遇到图像生成异质性和类不平衡特性，这对数据驱动的机器学习方法的强泛化预测性能提出了挑战。本研究的目的是探讨协调和超采样方法对多中心不平衡数据集的影响，并将其具体应用于基于 PET 的放射组学建模，以预测非小细胞肺癌（NSCLC）的组织学亚型：这项回顾性研究包括来自4个中心的245名腺癌（ADC）患者和78名鳞癌（SCC）患者。利用每位患者1502个放射组学特征，我们训练、验证并测试了4种机器学习分类器，以研究无协调（NoH）或4种特征协调方法、无过度取样（NoO）或5种过度取样方法对亚型预测的影响。通过 5 次 5 倍交叉验证，使用 ROC 曲线下的平均面积 (AUROC) 和 G-mean 对模型性能进行评估。使用 DeLong 检验对每一倍交叉验证的组合模型与基线（NoH + NoO）进行统计比较：结果：在验证和测试中，随机森林（RF）、线性判别分析（LDA）、逻辑回归（LR）和支持向量机（SVM）的 AUROC 和 G-mean 均优于基线的交叉组合数量分别为 15、4、2 和 7（共 29 个）。通过 RF 进行的 ComBat 协调与超采样（SMOTE）的性能优于基线（AUROC 和 G-mean of validation：0.725 vs. 0.725 vs. 0.725）：分别为 0.725 vs. 0.608 和 0.625 vs. 0.398；测试结果为 0.637 vs. 0.398：结论：采用协调和超采样技术，可以提高性能和效率：结论：在多中心不平衡数据集中应用协调和超采样方法可以改善 NSCLC 亚型预测，但不同分类器的效果差异很大。我们在不同的分类器上对协调和过度采样进行了开源比较，以便在不同的研究中进行综合评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

EJNMMI Physics Physics and Astronomy-Radiation

CiteScore

6.70

自引率

10.00%

发文量

审稿时长

13 weeks

期刊介绍： EJNMMI Physics is an international platform for scientists, users and adopters of nuclear medicine with a particular interest in physics matters. As a companion journal to the European Journal of Nuclear Medicine and Molecular Imaging, this journal has a multi-disciplinary approach and welcomes original materials and studies with a focus on applied physics and mathematics as well as imaging systems engineering and prototyping in nuclear medicine. This includes physics-driven approaches or algorithms supported by physics that foster early clinical adoption of nuclear medicine imaging and therapy.