Hospital Length of Stay Prediction for Planned Admissions Using Observational Medical Outcomes Partnership Common Data Model: Retrospective Study.

IF 6 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Medical Internet Research Pub Date : 2024-11-22 DOI:10.2196/59260

Haeun Lee, Seok Kim, Hui-Woun Moon, Ho-Young Lee, Kwangsoo Kim, Se Young Jung, Sooyoung Yoo

{"title":"Hospital Length of Stay Prediction for Planned Admissions Using Observational Medical Outcomes Partnership Common Data Model: Retrospective Study.","authors":"Haeun Lee, Seok Kim, Hui-Woun Moon, Ho-Young Lee, Kwangsoo Kim, Se Young Jung, Sooyoung Yoo","doi":"10.2196/59260","DOIUrl":null,"url":null,"abstract":"Background: Accurate hospital length of stay (LoS) prediction enables efficient resource management. Conventional LoS prediction models with limited covariates and nonstandardized data have limited reproducibility when applied to the general population.Objective: In this study, we developed and validated a machine learning (ML)-based LoS prediction model for planned admissions using the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM).Methods: Retrospective patient-level prediction models used electronic health record (EHR) data converted to the OMOP CDM (version 5.3) from Seoul National University Bundang Hospital (SNUBH) in South Korea. The study included 137,437 hospital admission episodes between January 2016 and December 2020. Covariates from the patient, condition occurrence, medication, observation, measurement, procedure, and visit occurrence tables were included in the analysis. To perform feature selection, we applied Lasso regularization in the logistic regression. The primary outcome was an LoS of 7 days or longer, while the secondary outcome was an LoS of 3 days or longer. The prediction models were developed using 6 ML algorithms, with the training and test set split in a 7:3 ratio. The performance of each model was evaluated based on the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Shapley Additive Explanations (SHAP) analysis measured feature importance, while calibration plots assessed the reliability of the prediction models. External validation of the developed models occurred at an independent institution, the Seoul National University Hospital.Results: The final sample included 129,938 patient entry events in the planned admissions. The Extreme Gradient Boosting (XGB) model achieved the best performance in binary classification for predicting an LoS of 7 days or longer, with an AUROC of 0.891 (95% CI 0.887-0.894) and an AUPRC of 0.819 (95% CI 0.813-0.826) on the internal test set. The Light Gradient Boosting (LGB) model performed the best in the multiclassification for predicting an LoS of 3 days or more, with an AUROC of 0.901 (95% CI 0.898-0.904) and an AUPRC of 0.770 (95% CI 0.762-0.779). The most important features contributing to the models were the operation performed, frequency of previous outpatient visits, patient admission department, age, and day of admission. The RF model showed robust performance in the external validation set, achieving an AUROC of 0.804 (95% CI 0.802-0.807).Conclusions: The use of the OMOP CDM in predicting hospital LoS for planned admissions demonstrates promising predictive capabilities for stays of varying durations. It underscores the advantage of standardized data in achieving reproducible results. This approach should serve as a model for enhancing operational efficiency and patient care coordination across health care settings.","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"26 ","pages":"e59260"},"PeriodicalIF":6.0000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11624451/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/59260","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Accurate hospital length of stay (LoS) prediction enables efficient resource management. Conventional LoS prediction models with limited covariates and nonstandardized data have limited reproducibility when applied to the general population.

Objective: In this study, we developed and validated a machine learning (ML)-based LoS prediction model for planned admissions using the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM).

Methods: Retrospective patient-level prediction models used electronic health record (EHR) data converted to the OMOP CDM (version 5.3) from Seoul National University Bundang Hospital (SNUBH) in South Korea. The study included 137,437 hospital admission episodes between January 2016 and December 2020. Covariates from the patient, condition occurrence, medication, observation, measurement, procedure, and visit occurrence tables were included in the analysis. To perform feature selection, we applied Lasso regularization in the logistic regression. The primary outcome was an LoS of 7 days or longer, while the secondary outcome was an LoS of 3 days or longer. The prediction models were developed using 6 ML algorithms, with the training and test set split in a 7:3 ratio. The performance of each model was evaluated based on the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Shapley Additive Explanations (SHAP) analysis measured feature importance, while calibration plots assessed the reliability of the prediction models. External validation of the developed models occurred at an independent institution, the Seoul National University Hospital.

Results: The final sample included 129,938 patient entry events in the planned admissions. The Extreme Gradient Boosting (XGB) model achieved the best performance in binary classification for predicting an LoS of 7 days or longer, with an AUROC of 0.891 (95% CI 0.887-0.894) and an AUPRC of 0.819 (95% CI 0.813-0.826) on the internal test set. The Light Gradient Boosting (LGB) model performed the best in the multiclassification for predicting an LoS of 3 days or more, with an AUROC of 0.901 (95% CI 0.898-0.904) and an AUPRC of 0.770 (95% CI 0.762-0.779). The most important features contributing to the models were the operation performed, frequency of previous outpatient visits, patient admission department, age, and day of admission. The RF model showed robust performance in the external validation set, achieving an AUROC of 0.804 (95% CI 0.802-0.807).

Conclusions: The use of the OMOP CDM in predicting hospital LoS for planned admissions demonstrates promising predictive capabilities for stays of varying durations. It underscores the advantage of standardized data in achieving reproducible results. This approach should serve as a model for enhancing operational efficiency and patient care coordination across health care settings.

Abstract Image

查看原文本刊更多论文

使用观察性医疗结果伙伴关系通用数据模型预测计划入院的住院时间：回顾性研究。

背景：准确的住院时间（LoS）预测有助于实现有效的资源管理。传统的住院时间预测模型协变量有限且数据非标准化，应用于普通人群时可重复性有限：在这项研究中，我们利用观察性医疗结果合作组织通用数据模型（OMOP CDM），开发并验证了基于机器学习（ML）的计划入院 LoS 预测模型：回顾性患者水平预测模型使用了韩国首尔国立大学盆唐医院（SNUBH）转换为 OMOP CDM（5.3 版）的电子健康记录（EHR）数据。研究纳入了 2016 年 1 月至 2020 年 12 月期间的 137437 例入院病例。分析中包含了患者、病情发生、用药、观察、测量、手术和就诊发生表中的协变量。为了进行特征选择，我们在逻辑回归中应用了 Lasso 正则化。主要结果为 7 天或更长时间的 LoS，次要结果为 3 天或更长时间的 LoS。预测模型采用 6 种 ML 算法开发，训练集和测试集的比例为 7:3。每个模型的性能都是根据接收者操作特征曲线下面积（AUROC）和精确度-召回曲线下面积（AUPRC）进行评估的。Shapley Additive Explanations (SHAP) 分析衡量了特征的重要性，而校准图则评估了预测模型的可靠性。在首尔国立大学医院这一独立机构对所开发的模型进行了外部验证：最终样本包括计划入院的 129938 个患者入院事件。在内部测试集上，极梯度提升（XGB）模型在预测 7 天或更长时间的 LoS 的二元分类中表现最佳，AUROC 为 0.891（95% CI 0.887-0.894），AUPRC 为 0.819（95% CI 0.813-0.826）。轻梯度提升（LGB）模型在预测 3 天或以上 LoS 的多重分类中表现最佳，AUROC 为 0.901（95% CI 0.898-0.904），AUPRC 为 0.770（95% CI 0.762-0.779）。对模型有贡献的最重要特征是所做手术、以前门诊就诊频率、患者入院科室、年龄和入院日期。RF模型在外部验证集中表现出强劲的性能，AUROC达到0.804（95% CI 0.802-0.807）：结论：使用 OMOP CDM 预测计划入院患者的 LoS 显示了对不同住院时间的预测能力。它强调了标准化数据在实现结果可重复性方面的优势。这种方法可作为提高医疗机构运营效率和患者护理协调的典范。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Internet Research 医学-卫生保健

CiteScore

14.40

自引率

5.40%

发文量

654

审稿时长

1 months

期刊介绍： The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.