The effect of resampling techniques on the performances of machine learning clinical risk prediction models in the setting of severe class imbalance: development and internal validation in a retrospective cohort.

Discover artificial intelligence Pub Date : 2024-01-01 Epub Date: 2024-11-26 DOI:10.1007/s44163-024-00199-0
Janny Xue Chen Ke, Arunachalam DhakshinaMurthy, Ronald B George, Paula Branco
{"title":"The effect of resampling techniques on the performances of machine learning clinical risk prediction models in the setting of severe class imbalance: development and internal validation in a retrospective cohort.","authors":"Janny Xue Chen Ke, Arunachalam DhakshinaMurthy, Ronald B George, Paula Branco","doi":"10.1007/s44163-024-00199-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The availability of population datasets and machine learning techniques heralded a new era of sophisticated prediction models involving a large number of routinely collected variables. However, severe class imbalance in clinical datasets is a major challenge. The aim of this study is to investigate the impact of commonly-used resampling techniques in combination with commonly-used machine learning algorithms in a clinical dataset, to determine whether combination(s) of these approaches improve upon the original multivariable logistic regression with no resampling.</p><p><strong>Methods: </strong>We previously developed and internally validated a multivariable logistic regression 30-day mortality prediction model in 30,619 patients using preoperative and intraoperative features.Using the same dataset, we systematically evaluated and compared model performances after application of resampling techniques [random under-sampling, near miss under-sampling, random oversampling, and synthetic minority oversampling (SMOTE)] in combination with machine learning algorithms (logistic regression, elastic net, decision trees, random forest, and extreme gradient boosting).</p><p><strong>Results: </strong>We found that in the setting of severe class imbalance, the impact of resampling techniques on model performance varied by the machine learning algorithm and the evaluation metric. Existing resampling techniques did not meaningfully improve area under receiving operating curve (AUROC). The area under the precision recall curve (AUPRC) was only increased by random under-sampling and SMOTE for decision trees, and oversampling and SMOTE for extreme gradient boosting. Importantly, some combinations of algorithm and resampling technique decreased AUROC and AUPRC compared to no resampling.</p><p><strong>Conclusion: </strong>Existing resampling techniques had a variable impact on models, depending on the algorithms and the evaluation metrics. Future research is needed to improve predictive performances in the setting of severe class imbalance.</p>","PeriodicalId":520312,"journal":{"name":"Discover artificial intelligence","volume":"4 1","pages":"91"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610218/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discover artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s44163-024-00199-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/26 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: The availability of population datasets and machine learning techniques heralded a new era of sophisticated prediction models involving a large number of routinely collected variables. However, severe class imbalance in clinical datasets is a major challenge. The aim of this study is to investigate the impact of commonly-used resampling techniques in combination with commonly-used machine learning algorithms in a clinical dataset, to determine whether combination(s) of these approaches improve upon the original multivariable logistic regression with no resampling.

Methods: We previously developed and internally validated a multivariable logistic regression 30-day mortality prediction model in 30,619 patients using preoperative and intraoperative features.Using the same dataset, we systematically evaluated and compared model performances after application of resampling techniques [random under-sampling, near miss under-sampling, random oversampling, and synthetic minority oversampling (SMOTE)] in combination with machine learning algorithms (logistic regression, elastic net, decision trees, random forest, and extreme gradient boosting).

Results: We found that in the setting of severe class imbalance, the impact of resampling techniques on model performance varied by the machine learning algorithm and the evaluation metric. Existing resampling techniques did not meaningfully improve area under receiving operating curve (AUROC). The area under the precision recall curve (AUPRC) was only increased by random under-sampling and SMOTE for decision trees, and oversampling and SMOTE for extreme gradient boosting. Importantly, some combinations of algorithm and resampling technique decreased AUROC and AUPRC compared to no resampling.

Conclusion: Existing resampling techniques had a variable impact on models, depending on the algorithms and the evaluation metrics. Future research is needed to improve predictive performances in the setting of severe class imbalance.

重采样技术对严重班级失衡情况下机器学习临床风险预测模型性能的影响:回顾性队列的发展和内部验证。
目的:人口数据集和机器学习技术的可用性预示着涉及大量常规收集变量的复杂预测模型的新时代。然而,临床数据集中严重的分类不平衡是一个重大挑战。本研究的目的是研究临床数据集中常用的重采样技术与常用的机器学习算法相结合的影响,以确定这些方法的组合是否在没有重采样的情况下改善了原始的多变量逻辑回归。方法:我们先前开发并内部验证了30,619例患者术前和术中特征的多变量logistic回归30天死亡率预测模型。使用相同的数据集,我们系统地评估和比较了重新采样技术(随机欠采样、近缺失欠采样、随机过采样和合成少数过采样(SMOTE))与机器学习算法(逻辑回归、弹性网络、决策树、随机森林和极端梯度增强)结合后的模型性能。结果:我们发现,在严重类失衡的情况下,重采样技术对模型性能的影响因机器学习算法和评估指标而异。现有的重采样技术并没有显著提高接收工作曲线下面积(AUROC)。对于决策树的随机欠采样和SMOTE,以及极端梯度增强的随机过采样和SMOTE,只会增加精确召回曲线下的面积。重要的是,与没有重采样相比,一些算法和重采样技术的组合降低了AUROC和AUPRC。结论:现有的重采样技术对模型有不同的影响,取决于算法和评估指标。在班级严重失衡的情况下,需要进一步的研究来提高预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信