Mortality Prediction Performance Under Geographical, Temporal, and COVID-19 Pandemic Dataset Shift: External Validation of the Global Open-Source Severity of Illness Score Model.

IF 2.7 Q4 Medicine

Critical care explorations Pub Date : 2025-06-04 eCollection Date: 2025-06-01 DOI:10.1097/CCE.0000000000001275

Takeshi Tohyama, Liam G McCoy, Euma Ishii, Sahil Sood, Jesse Raffa, Takahiro Kinoshita, Leo Anthony Celi, Satoru Hashimoto

{"title":"Mortality Prediction Performance Under Geographical, Temporal, and COVID-19 Pandemic Dataset Shift: External Validation of the Global Open-Source Severity of Illness Score Model.","authors":"Takeshi Tohyama, Liam G McCoy, Euma Ishii, Sahil Sood, Jesse Raffa, Takahiro Kinoshita, Leo Anthony Celi, Satoru Hashimoto","doi":"10.1097/CCE.0000000000001275","DOIUrl":null,"url":null,"abstract":"Background: Risk-prediction models are widely used for quality of care evaluations, resource management, and patient stratification in research. While established models have long been used for risk prediction, healthcare has evolved significantly, and the optimal model must be selected for evaluation in line with contemporary healthcare settings and regional considerations.Objectives: To evaluate the geographic and temporal generalizability of the models for mortality prediction in ICUs through external validation in Japan.Derivation cohort: Not applicable.Validation cohort: The care Japanese Intensive care PAtient Database from 2015 to 2022.Prediction model: The Global Open-Source Severity of Illness Score (GOSSIS-1), a modern risk model utilizing machine learning approaches, was compared with conventional models-the Acute Physiology and Chronic Health Evaluation (APACHE-II and APACHE-III)-and a locally calibrated model, the Japan Risk of Death (JROD).Results: Despite the demographic and clinical differences of the validation cohort, GOSSIS-1 maintained strong discrimination, achieving an area under the curve of 0.908, comparable to APACHE-III (0.908) and JROD (0.910). It also exhibited superior calibration, achieving a standardized mortality ratio (SMR) of 0.89 (95% CI, 0.88-0.90), significantly outperforming APACHE-II (SMR, 0.39; 95% CI, 0.39-0.40) and APACHE-III (SMR, 0.46; 95% CI, 0.46-0.47), and demonstrating a performance close to that of JROD (SMR, 0.97; 95% CI, 0.96-0.99). However, performance varied significantly across disease categories, with suboptimal calibration for neurologic conditions and trauma. While the model showed temporal stability from 2015 to 2019, performance deteriorated during the COVID-19 pandemic, broadly reducing performance across disease categories in 2020. This trend was particularly pronounced in GOSSIS compared with APACHE-III.Conclusions: GOSSIS-1 demonstrates robust discrimination despite substantial geographic dataset shift but shows important calibration variations across disease categories. In particular, in a complex model like GOSSIS-1, stresses on the health system, such as a pandemic, can manifest changes in model calibration.","PeriodicalId":93957,"journal":{"name":"Critical care explorations","volume":"7 6","pages":"e1275"},"PeriodicalIF":2.7000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12140679/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Critical care explorations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/CCE.0000000000001275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Risk-prediction models are widely used for quality of care evaluations, resource management, and patient stratification in research. While established models have long been used for risk prediction, healthcare has evolved significantly, and the optimal model must be selected for evaluation in line with contemporary healthcare settings and regional considerations.

Objectives: To evaluate the geographic and temporal generalizability of the models for mortality prediction in ICUs through external validation in Japan.

Derivation cohort: Not applicable.

Validation cohort: The care Japanese Intensive care PAtient Database from 2015 to 2022.

Prediction model: The Global Open-Source Severity of Illness Score (GOSSIS-1), a modern risk model utilizing machine learning approaches, was compared with conventional models-the Acute Physiology and Chronic Health Evaluation (APACHE-II and APACHE-III)-and a locally calibrated model, the Japan Risk of Death (JROD).

Results: Despite the demographic and clinical differences of the validation cohort, GOSSIS-1 maintained strong discrimination, achieving an area under the curve of 0.908, comparable to APACHE-III (0.908) and JROD (0.910). It also exhibited superior calibration, achieving a standardized mortality ratio (SMR) of 0.89 (95% CI, 0.88-0.90), significantly outperforming APACHE-II (SMR, 0.39; 95% CI, 0.39-0.40) and APACHE-III (SMR, 0.46; 95% CI, 0.46-0.47), and demonstrating a performance close to that of JROD (SMR, 0.97; 95% CI, 0.96-0.99). However, performance varied significantly across disease categories, with suboptimal calibration for neurologic conditions and trauma. While the model showed temporal stability from 2015 to 2019, performance deteriorated during the COVID-19 pandemic, broadly reducing performance across disease categories in 2020. This trend was particularly pronounced in GOSSIS compared with APACHE-III.

Conclusions: GOSSIS-1 demonstrates robust discrimination despite substantial geographic dataset shift but shows important calibration variations across disease categories. In particular, in a complex model like GOSSIS-1, stresses on the health system, such as a pandemic, can manifest changes in model calibration.

Abstract Image

查看原文本刊更多论文

地理、时间和COVID-19大流行数据转移下的死亡率预测性能：全球开源疾病严重程度评分模型的外部验证

背景：风险预测模型在研究中广泛应用于护理质量评估、资源管理和患者分层。虽然已建立的模型长期用于风险预测，但医疗保健已经发生了重大变化，必须根据当代医疗保健环境和区域考虑因素选择最佳模型进行评估。目的：通过日本的外部验证，评估icu死亡率预测模型的地理和时间普遍性。派生队列：不适用。验证队列：2015 - 2022年日本重症监护患者数据库。预测模型：全球开源疾病严重程度评分（gosiss -1）是一种利用机器学习方法的现代风险模型，将其与传统模型——急性生理和慢性健康评估（APACHE-II和APACHE-III）——以及当地校准的模型——日本死亡风险（JROD）进行比较。结果：尽管验证队列的人口学和临床存在差异，但gosiss -1保持了很强的鉴别性，曲线下面积为0.908，与APACHE-III（0.908）和JROD（0.910）相当。它也表现出优异的校准性，标准化死亡率（SMR）为0.89 (95% CI, 0.88-0.90)，显著优于APACHE-II (SMR, 0.39；95% CI, 0.39-0.40)和APACHE-III (SMR, 0.46；95% CI, 0.46-0.47)，表现出接近JROD的性能(SMR, 0.97；95% ci, 0.96-0.99)。然而，不同疾病类别的表现差异很大，神经系统状况和创伤的校准不理想。虽然该模型在2015年至2019年期间表现出时间稳定性，但在2019冠状病毒病大流行期间，该模型的性能恶化，在2020年大幅降低了所有疾病类别的性能。与APACHE-III相比，这一趋势在GOSSIS中尤为明显。结论：gosiss -1显示出强大的辨别能力，尽管存在大量的地理数据转移，但在疾病类别之间显示出重要的校准差异。特别是，在像GOSSIS-1这样的复杂模型中，对卫生系统的压力，如大流行，可能会表现出模型校准的变化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊