Multimodal Machine Learning for Predicting Perioperative Safety Indicators in Spinal Surgery.

IF 4.9 1区 医学 Q1 CLINICAL NEUROLOGY
Kyle Mani, Thomas Scharfenberger, Samuel N Goldman, Emily Kleinbart, Evan Mostafa, Rafael De La Garza Ramos, Mitchell S Fourman, Ananth Eleswarapu
{"title":"Multimodal Machine Learning for Predicting Perioperative Safety Indicators in Spinal Surgery.","authors":"Kyle Mani, Thomas Scharfenberger, Samuel N Goldman, Emily Kleinbart, Evan Mostafa, Rafael De La Garza Ramos, Mitchell S Fourman, Ananth Eleswarapu","doi":"10.1016/j.spinee.2025.03.021","DOIUrl":null,"url":null,"abstract":"<p><strong>Background context: </strong>Machine learning (ML) algorithms can utilize the large amount of tabular data in electronic health records (EHRs) to predict peri-operative safety indicators. Integrating unstructured free-text inputs via natural language processing (NLP) may further enhance predictive accuracy.</p><p><strong>Purpose: </strong>To design and validate a pre-operative multi-modal machine learning architecture that integrates structured EHR data (patient demographics, comorbidities, and clinical covariates) with unstructured free-text inputs (past medical and surgical history, medications, and problem lists) via natural language processing (NLP). The multi-modal models aim to improve the prediction of peri-operative safety indicators compared to baseline ML models that only use structured tabular EHR data.</p><p><strong>Study design: </strong>Retrospective cohort study PATIENT SAMPLE: 1,898 patients admitted for elective or emergency spine surgery at four separate large urban academic spine centers during a five-year period from 2018-2023.</p><p><strong>Outcome measures: </strong>Numerical outputs between 0 to 1 corresponding to the likelihood of (I) extended length of stay (LOS), (II) 90-day reoperation, and (III) peri-operative intensive care unit (ICU) admission.</p><p><strong>Methods: </strong>We predicted the following safety indicators (I) extended length of stay (LOS), II (90-day reoperation, and (III) peri-operative intensive care unit (ICU) admission. The quanteda package for NLP within the R environment was utilized to preprocess free-text EHR inputs. The refined text was tokenized and transformed into numerical vectors using a bag-of-words approach and integrated with the tabular EHR data to create a document-feature matrix. Two extreme gradient boosted (XGBoost) ML models were trained: a base model utilizing only structured tabular EHR data and a combined multi-modal model that leveraged both combined structured tabular EHR data with numerical vectors derived from free-text NLP inputs. Hyperparameter tuning was performed via grid search, and the models were validated using 10-fold cross validation with an 80:20 training/testing split. Word clouds were generated for the free-text data and explainable artificial intelligence (XAI) techniques were employed for feature importance. Metrics calculated for model performance included Area Under the Receiving-Operating Characteristic Curve (AUC-ROC), Brier score, Calibration slope, Calibration Intercept, Precision, Recall and F1-Score.</p><p><strong>Results: </strong>1,898 patients (60.7% female) were extracted from January 2018 to September 2023, with a median age of 60.0 (IQR: 52.0 - 68.0) and median body mass index (BMI) of 30.3 kgm<sup>2</sup> (IQR: 26.3 - 34.6). Extended LOS was defined as ≥ 14.4 days, constituting 10.1% of all individuals. The median LOS for the entire cohort was 4.0 days (IQR: 2.0 - 7.0), while the 90-day reoperation rate was 10.54%, and the ICU admission rate was 7.74%. The pre-operative tabular EHR models predicted peri-operative safety indicators with AUC ranging from 0.770 to 0.779, Brier scores ranging from 0.074 to 0.099, and calibration slopes ranging from 2.279 to 2.418. Precision and recall for this model ranged from 0.918 to 0.973 and 0.988 to 0.994, respectively, resulting in F1-scores between 0.954 and 0.973. The combined multi-modal models predicted peri-operative safety indicators with AUC ranging from 0.827 to 0.903, Brier scores ranging from 0.056 to 0.083, and calibration slopes ranging from 0.755 to 1.217. The multi-modal models achieved precision ranging from 0.909 to 0.933 and recall ranging from 0.979 to 0.994, leading to F1-scores between 0.943 and 0.962. Important tabular predictors included patient age, BMI, hemoglobin level, white blood cell count, platelet count, and a combined anterior/posterior spinal fusion approach. Important free-text inputs included vertebral osteomyelitis, radiculopathy, myelopathy, and spinal metastasis.</p><p><strong>Conclusions: </strong>The multi-modal NLP model exhibited superior performance in all outcome measures when compared to the baseline tabular model. Future work includes incorporating additional model dimensions, such as the history of present illness, physical exam, and spinal imaging, and clinically implementing the models into our informed consent and pre-operative optimization pathway.</p>","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.03.021","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background context: Machine learning (ML) algorithms can utilize the large amount of tabular data in electronic health records (EHRs) to predict peri-operative safety indicators. Integrating unstructured free-text inputs via natural language processing (NLP) may further enhance predictive accuracy.

Purpose: To design and validate a pre-operative multi-modal machine learning architecture that integrates structured EHR data (patient demographics, comorbidities, and clinical covariates) with unstructured free-text inputs (past medical and surgical history, medications, and problem lists) via natural language processing (NLP). The multi-modal models aim to improve the prediction of peri-operative safety indicators compared to baseline ML models that only use structured tabular EHR data.

Study design: Retrospective cohort study PATIENT SAMPLE: 1,898 patients admitted for elective or emergency spine surgery at four separate large urban academic spine centers during a five-year period from 2018-2023.

Outcome measures: Numerical outputs between 0 to 1 corresponding to the likelihood of (I) extended length of stay (LOS), (II) 90-day reoperation, and (III) peri-operative intensive care unit (ICU) admission.

Methods: We predicted the following safety indicators (I) extended length of stay (LOS), II (90-day reoperation, and (III) peri-operative intensive care unit (ICU) admission. The quanteda package for NLP within the R environment was utilized to preprocess free-text EHR inputs. The refined text was tokenized and transformed into numerical vectors using a bag-of-words approach and integrated with the tabular EHR data to create a document-feature matrix. Two extreme gradient boosted (XGBoost) ML models were trained: a base model utilizing only structured tabular EHR data and a combined multi-modal model that leveraged both combined structured tabular EHR data with numerical vectors derived from free-text NLP inputs. Hyperparameter tuning was performed via grid search, and the models were validated using 10-fold cross validation with an 80:20 training/testing split. Word clouds were generated for the free-text data and explainable artificial intelligence (XAI) techniques were employed for feature importance. Metrics calculated for model performance included Area Under the Receiving-Operating Characteristic Curve (AUC-ROC), Brier score, Calibration slope, Calibration Intercept, Precision, Recall and F1-Score.

Results: 1,898 patients (60.7% female) were extracted from January 2018 to September 2023, with a median age of 60.0 (IQR: 52.0 - 68.0) and median body mass index (BMI) of 30.3 kgm2 (IQR: 26.3 - 34.6). Extended LOS was defined as ≥ 14.4 days, constituting 10.1% of all individuals. The median LOS for the entire cohort was 4.0 days (IQR: 2.0 - 7.0), while the 90-day reoperation rate was 10.54%, and the ICU admission rate was 7.74%. The pre-operative tabular EHR models predicted peri-operative safety indicators with AUC ranging from 0.770 to 0.779, Brier scores ranging from 0.074 to 0.099, and calibration slopes ranging from 2.279 to 2.418. Precision and recall for this model ranged from 0.918 to 0.973 and 0.988 to 0.994, respectively, resulting in F1-scores between 0.954 and 0.973. The combined multi-modal models predicted peri-operative safety indicators with AUC ranging from 0.827 to 0.903, Brier scores ranging from 0.056 to 0.083, and calibration slopes ranging from 0.755 to 1.217. The multi-modal models achieved precision ranging from 0.909 to 0.933 and recall ranging from 0.979 to 0.994, leading to F1-scores between 0.943 and 0.962. Important tabular predictors included patient age, BMI, hemoglobin level, white blood cell count, platelet count, and a combined anterior/posterior spinal fusion approach. Important free-text inputs included vertebral osteomyelitis, radiculopathy, myelopathy, and spinal metastasis.

Conclusions: The multi-modal NLP model exhibited superior performance in all outcome measures when compared to the baseline tabular model. Future work includes incorporating additional model dimensions, such as the history of present illness, physical exam, and spinal imaging, and clinically implementing the models into our informed consent and pre-operative optimization pathway.

多模态机器学习预测脊柱手术围手术期安全指标。
背景背景:机器学习(ML)算法可以利用电子健康记录(EHRs)中的大量表格数据来预测围手术期安全指标。通过自然语言处理(NLP)整合非结构化的自由文本输入可以进一步提高预测的准确性。目的:设计并验证一个术前多模式机器学习架构,该架构通过自然语言处理(NLP)将结构化EHR数据(患者人口统计、合并症和临床协变量)与非结构化自由文本输入(过去的医疗和手术史、药物和问题列表)集成在一起。与仅使用结构化表格式EHR数据的基线ML模型相比,多模态模型旨在提高围手术期安全指标的预测。研究设计:回顾性队列研究患者样本:2018-2023年五年间,在四个独立的大型城市学术脊柱中心接受选择性或紧急脊柱手术的1,898例患者。结果测量:数值输出在0到1之间,对应于(I)延长住院时间(LOS), (II) 90天再手术,(III)围手术期重症监护病房(ICU)入住的可能性。方法:预测以下安全指标:(1)延长住院时间(LOS),(2) 90天再手术,(3)围手术期重症监护病房(ICU)入住。使用R环境中的NLP量化包对自由文本EHR输入进行预处理。使用词袋方法对精炼文本进行标记并转换为数值向量,并与表格式EHR数据集成以创建文档特征矩阵。训练了两个极端梯度增强(XGBoost) ML模型:一个仅利用结构化表格式EHR数据的基础模型和一个结合了结构化表格式EHR数据和来自自由文本NLP输入的数值向量的组合多模态模型。通过网格搜索执行超参数调优,并使用10倍交叉验证和80:20的训练/测试分割来验证模型。对自由文本数据生成词云,并采用可解释的人工智能(XAI)技术对特征重要性进行分析。计算模型性能的指标包括接收-工作特征曲线下面积(AUC-ROC)、Brier评分、校准斜率、校准截距、精度、召回率和f1评分。结果:2018年1月至2023年9月共提取患者1898例(女性60.7%),中位年龄60.0岁(IQR: 52.0 ~ 68.0),中位体重指数(BMI) 30.3 kgm2 (IQR: 26.3 ~ 34.6)。延长的LOS定义为≥14.4天,占所有个体的10.1%。整个队列的中位LOS为4.0天(IQR: 2.0 - 7.0), 90天再手术率为10.54%,ICU入院率为7.74%。术前表格式EHR模型预测围术期安全指标的AUC范围为0.770 ~ 0.779,Brier评分范围为0.074 ~ 0.099,校正斜率范围为2.279 ~ 2.418。该模型的精密度和召回率分别在0.918 ~ 0.973和0.988 ~ 0.994之间,f1得分在0.954 ~ 0.973之间。联合多模态模型预测围手术期安全指标AUC范围为0.827 ~ 0.903,Brier评分范围为0.056 ~ 0.083,校正斜率范围为0.755 ~ 1.217。多模态模型的准确率在0.909 ~ 0.933之间,召回率在0.979 ~ 0.994之间,f1得分在0.943 ~ 0.962之间。重要的预测指标包括患者年龄、BMI、血红蛋白水平、白细胞计数、血小板计数和脊柱前后融合联合入路。重要的自由文本输入包括椎体骨髓炎、神经根病、脊髓病和脊柱转移。结论:与基线表格模型相比,多模态NLP模型在所有结果测量中表现出优越的性能。未来的工作包括纳入其他模型维度,如病史、体格检查和脊柱成像,并在临床上将这些模型应用到我们的知情同意和术前优化途径中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Spine Journal
Spine Journal 医学-临床神经学
CiteScore
8.20
自引率
6.70%
发文量
680
审稿时长
13.1 weeks
期刊介绍: The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信