Temporal Generalizability of Machine Learning Models for Predicting Postoperative Delirium Using Electronic Health Record Data: Model Development and Validation Study.

Koutarou Matsumoto, Yasunobu Nohara, Mikako Sakaguchi, Yohei Takayama, Syota Fukushige, Hidehisa Soejima, Naoki Nakashima, Masahiro Kamouchi
{"title":"Temporal Generalizability of Machine Learning Models for Predicting Postoperative Delirium Using Electronic Health Record Data: Model Development and Validation Study.","authors":"Koutarou Matsumoto, Yasunobu Nohara, Mikako Sakaguchi, Yohei Takayama, Syota Fukushige, Hidehisa Soejima, Naoki Nakashima, Masahiro Kamouchi","doi":"10.2196/50895","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications.</p><p><strong>Objective: </strong>The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model.</p><p><strong>Methods: </strong>The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using extreme gradient boosting (XGBoost) and a sparse linear regression model using least absolute shrinkage and selection operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11,863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the confusion assessment method.</p><p><strong>Results: </strong>A total of 6497 patients (68.5, SD 14.4 years, women n=2627, 40.4%) were included in the derivation cohort, and 5366 patients (67.8, SD 14.6 years, women n=2105, 39.2%) were included in the validation cohort. Regarding discrimination, the XGBoost model (AUROC 0.87-0.90 and MCC 0.34-0.44) did not significantly outperform the LASSO model (AUROC 0.86-0.89 and MCC 0.34-0.41). The logistic regression model (AUROC 0.84-0.88, MCC 0.33-0.40, slope 1.01-1.19, intercept -0.16 to 0.06, and Brier score 0.06-0.07), with 8 predictors (age, intensive care unit, neurosurgery, emergency admission, anesthesia time, BMI, blood loss during surgery, and use of an ambulance) achieved good predictive performance.</p><p><strong>Conclusions: </strong>The XGBoost model did not significantly outperform the LASSO model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium.</p>","PeriodicalId":73557,"journal":{"name":"JMIR perioperative medicine","volume":"6 ","pages":"e50895"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10636625/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR perioperative medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/50895","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications.

Objective: The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model.

Methods: The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using extreme gradient boosting (XGBoost) and a sparse linear regression model using least absolute shrinkage and selection operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11,863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the confusion assessment method.

Results: A total of 6497 patients (68.5, SD 14.4 years, women n=2627, 40.4%) were included in the derivation cohort, and 5366 patients (67.8, SD 14.6 years, women n=2105, 39.2%) were included in the validation cohort. Regarding discrimination, the XGBoost model (AUROC 0.87-0.90 and MCC 0.34-0.44) did not significantly outperform the LASSO model (AUROC 0.86-0.89 and MCC 0.34-0.41). The logistic regression model (AUROC 0.84-0.88, MCC 0.33-0.40, slope 1.01-1.19, intercept -0.16 to 0.06, and Brier score 0.06-0.07), with 8 predictors (age, intensive care unit, neurosurgery, emergency admission, anesthesia time, BMI, blood loss during surgery, and use of an ambulance) achieved good predictive performance.

Conclusions: The XGBoost model did not significantly outperform the LASSO model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium.

利用电子健康记录数据预测术后谵妄的机器学习模型的时间通用性:模型开发和验证研究。
背景:尽管机器学习模型在预测术后谵妄方面显示出巨大的潜力,但其在现实世界中的实施优势尚不清楚,需要在实际应用中与传统模型进行比较。目的:与传统的逻辑回归模型相比,本研究的目的是验证决策树集成和稀疏线性回归模型在预测手术后谵妄方面的时间可推广性。方法:以电子方式收集日本熊本县高级急诊和重症监护医疗中心住院患者的健康记录数据。我们开发了一个使用极端梯度提升(XGBoost)的决策树集成模型和一个使用最小绝对收缩和选择算子(LASSO)回归的稀疏线性回归模型。为了评估该模型的预测性能,我们使用受试者工作特性曲线下面积(AUROC)和Matthews相关系数(MCC)来测量判别力,并使用预测概率和观测概率之间的回归斜率和截距来测量校准。Brier评分被评估为一个整体绩效指标。我们纳入了2017年12月至2022年2月期间接受全身麻醉手术的11863名连续患者。患者被分为新冠肺炎大流行前的衍生队列和新冠肺炎大流行期间的验证队列。术后谵妄按照混淆评定法进行诊断。结果:共有6497名患者(68.5,SD 14.4岁,女性n=2627,40.4%)被纳入推导队列,5366名患者(67.8,SD 14.6岁,女性n=2105,39.2%)被纳入验证队列。关于辨别力,XGBoost模型(AUROC 0.87-0.90和MCC 0.34-0.44)没有显著优于LASSO模型(AUROC 0.86-0.89和MCC 0.34-1.41)。逻辑回归模型(AUROC 0.84-0.88,MCC 0.33-0.40,斜率1.01-1.19,截距-0.16-0.06,Brier得分0.06-0.07),有8个预测因子(年龄、重症监护室、神经外科、急诊入院、麻醉时间、BMI、手术中的失血量和救护车的使用)实现了良好的预测性能。结论:XGBoost模型在预测术后谵妄方面并不显著优于LASSO模型。此外,具有几个重要预测因子的简约逻辑模型在预测术后谵妄方面取得了与机器学习模型相当的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.50
自引率
0.00%
发文量
0
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信