{"title":"Comparing ensemble learning algorithms and severity of illness scoring systems in cardiac intensive care units: a retrospective study.","authors":"Beatriz Nistal-Nuño","doi":"10.31744/einstein_journal/2024AO0467","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Beatriz Nistal-Nuño designed a machine learning system type of ensemble learning for patients undergoing cardiac surgery and intensive care unit cardiology patients, based on sequences of cardiovascular physiological measurements and other intensive care unit physiological measurements in addition to static features, which generates a score for prediction of mortality of cardiac intensive care unit patients.</p><p><strong>Background: </strong>■ Gradient Boosting Machine and Random Forest models were built for prediction of mortality at cardiac intensive care units.</p><p><strong>Background: </strong>■ A total of 9,761 intensive care unit stays of patients admitted under a Cardiac Surgery and Cardiac Medical services were studied.</p><p><strong>Background: </strong>■ The AUROC and AUPRC values were significantly superior to seven conventional systems compared.</p><p><strong>Background: </strong>■ The machine learning models' calibration curves were substantially closer to the ideal line.</p><p><strong>Objective: </strong>Logistic Regression has been used traditionally for the development of most predictor tools of intensive care unit mortality. The purpose of this study is to combine shared risk factors between patients undergoing cardiac surgery and intensive care unit cardiology patients to develop a risk score for prediction of mortality in cardiac intensive care unit patients, using machine learning.</p><p><strong>Methods: </strong>Gradient Boosting Machine and Distributed Random Forest models were developed based on 9,761 intensive care unit-stays from the MIMIC-III database. Sequential and static features were collected. The primary endpoint was intensive care unit mortality prediction. Discrimination, calibration, and accuracy statistics were evaluated. The predictive performance of traditional scoring systems was compared.</p><p><strong>Results: </strong>Machine learning models' AUROC and AUPRC were significantly superior to all conventional systems for the primary endpoint (p<0.05), with AUROC of 0.9413 for Gradient Boosting Machine and 0.9311 for Distributed Random Forest. Sensitivity was 0.6421 for Gradient Boosting Machine, 0.6 for Distributed Random Forest, and <0.3 for all conventional systems except for serial SOFA (0.6316). Precision was 0.574 for Gradient Boosting Machine, 0.566 for Distributed Random Forest, and <0.5 for all conventional systems. Diagnostic odds ratio was 58.8144 for Gradient Boosting Machine, 51.2926 for Distributed Random Forest and <34 for all conventional systems. Brier score was 0.025 for Gradient Boosting Machine and 0.028 for Distributed Random Forest, being worse for the traditional systems. Calibration curves of Gradient Boosting Machine and Distributed Random Forest were substantially closer to the ideal line.</p><p><strong>Conclusion: </strong>The machine learning models showed superiority over the traditional scoring systems compared, with Gradient Boosting Machine having the best performance. Discrimination and calibration were excellent for Gradient Boosting Machine, followed by Distributed Random Forest. The machine learning methods exhibited better capacity for most accuracy statistics.</p>","PeriodicalId":47359,"journal":{"name":"Einstein-Sao Paulo","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Einstein-Sao Paulo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31744/einstein_journal/2024AO0467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Beatriz Nistal-Nuño designed a machine learning system type of ensemble learning for patients undergoing cardiac surgery and intensive care unit cardiology patients, based on sequences of cardiovascular physiological measurements and other intensive care unit physiological measurements in addition to static features, which generates a score for prediction of mortality of cardiac intensive care unit patients.
Background: ■ Gradient Boosting Machine and Random Forest models were built for prediction of mortality at cardiac intensive care units.
Background: ■ A total of 9,761 intensive care unit stays of patients admitted under a Cardiac Surgery and Cardiac Medical services were studied.
Background: ■ The AUROC and AUPRC values were significantly superior to seven conventional systems compared.
Background: ■ The machine learning models' calibration curves were substantially closer to the ideal line.
Objective: Logistic Regression has been used traditionally for the development of most predictor tools of intensive care unit mortality. The purpose of this study is to combine shared risk factors between patients undergoing cardiac surgery and intensive care unit cardiology patients to develop a risk score for prediction of mortality in cardiac intensive care unit patients, using machine learning.
Methods: Gradient Boosting Machine and Distributed Random Forest models were developed based on 9,761 intensive care unit-stays from the MIMIC-III database. Sequential and static features were collected. The primary endpoint was intensive care unit mortality prediction. Discrimination, calibration, and accuracy statistics were evaluated. The predictive performance of traditional scoring systems was compared.
Results: Machine learning models' AUROC and AUPRC were significantly superior to all conventional systems for the primary endpoint (p<0.05), with AUROC of 0.9413 for Gradient Boosting Machine and 0.9311 for Distributed Random Forest. Sensitivity was 0.6421 for Gradient Boosting Machine, 0.6 for Distributed Random Forest, and <0.3 for all conventional systems except for serial SOFA (0.6316). Precision was 0.574 for Gradient Boosting Machine, 0.566 for Distributed Random Forest, and <0.5 for all conventional systems. Diagnostic odds ratio was 58.8144 for Gradient Boosting Machine, 51.2926 for Distributed Random Forest and <34 for all conventional systems. Brier score was 0.025 for Gradient Boosting Machine and 0.028 for Distributed Random Forest, being worse for the traditional systems. Calibration curves of Gradient Boosting Machine and Distributed Random Forest were substantially closer to the ideal line.
Conclusion: The machine learning models showed superiority over the traditional scoring systems compared, with Gradient Boosting Machine having the best performance. Discrimination and calibration were excellent for Gradient Boosting Machine, followed by Distributed Random Forest. The machine learning methods exhibited better capacity for most accuracy statistics.