Multimodal machine learning for predicting perioperative safety indicators in spinal surgery.

IF 4.7 1区医学 Q1 CLINICAL NEUROLOGY

Spine Journal Pub Date : 2025-03-29 DOI:10.1016/j.spinee.2025.03.021

Kyle Mani, Thomas Scharfenberger, Samuel N Goldman, Emily Kleinbart, Evan Mostafa, Rafael De La Garza Ramos, Mitchell S Fourman, Ananth Eleswarapu

{"title":"Multimodal machine learning for predicting perioperative safety indicators in spinal surgery.","authors":"Kyle Mani, Thomas Scharfenberger, Samuel N Goldman, Emily Kleinbart, Evan Mostafa, Rafael De La Garza Ramos, Mitchell S Fourman, Ananth Eleswarapu","doi":"10.1016/j.spinee.2025.03.021","DOIUrl":null,"url":null,"abstract":"Background context: Machine learning (ML) algorithms can utilize the large amount of tabular data in electronic health records (EHRs) to predict perioperative safety indicators. Integrating unstructured free-text inputs via natural language processing (NLP) may further enhance predictive accuracy.Purpose: To design and validate a preoperative multimodal ML architecture that integrates structured EHR data (patient demographics, comorbidities, and clinical covariates) with unstructured free-text inputs (past medical and surgical history, medications, and problem lists) via NLP. The multimodal models aim to improve the prediction of perioperative safety indicators compared to baseline ML models that only use structured tabular EHR data.Study design: Retrospective cohort study.Patient sample: 1,898 patients admitted for elective or emergency spine surgery at four separate large urban academic spine centers during a 5-year period from 2018 to 2023.Outcome measures: Numerical outputs between 0 and 1 corresponding to the likelihood of (I) extended length of stay (LOS), (II) 90-day reoperation, and (III) perioperative intensive care unit (ICU) admission.Methods: We predicted the following safety indicators (I) extended length of stay (LOS), (II) 90-day reoperation, and (III) perioperative intensive care unit (ICU) admission. The quanteda package for NLP within the R environment was utilized to preprocess free-text EHR inputs. The refined text was tokenized and transformed into numerical vectors using a bag-of-words approach and integrated with the tabular EHR data to create a document-feature matrix. Two extreme gradient boosted (XGBoost) ML models were trained: a base model utilizing only structured tabular EHR data and a combined multimodal model that leveraged both combined structured tabular EHR data with numerical vectors derived from free-text NLP inputs. Hyperparameter tuning was performed via grid search, and the models were validated using 10-fold cross validation with an 80:20 training/testing split. Word clouds were generated for the free-text data and explainable artificial intelligence (XAI) techniques were employed for feature importance. Metrics calculated for model performance included Area Under the Receiving-Operating Characteristic Curve (AUC-ROC), Brier score, Calibration slope, Calibration Intercept, Precision, Recall and F1-Score.Results: 1,898 patients (60.7% female) were extracted from January 2018 to September 2023, with a median age of 60.0 (IQR: 52.0-68.0) and median body mass index (BMI) of 30.3 kgm2 (IQR: 26.3-34.6). Extended LOS was defined as ≥ 14.4 days, constituting 10.1% of all individuals. The median LOS for the entire cohort was 4.0 days (IQR: 2.0-7.0), while the 90-day reoperation rate was 10.54%, and the ICU admission rate was 7.74%. The preoperative tabular EHR models predicted perioperative safety indicators with AUC ranging from 0.770 to 0.779, Brier scores ranging from 0.074 to 0.099, and calibration slopes ranging from 2.279 to 2.418. Precision and recall for this model ranged from 0.918 to 0.973 and 0.988 to 0.994, respectively, resulting in F1-scores between 0.954 and 0.973. The combined multimodal models predicted perioperative safety indicators with AUC ranging from 0.827 to 0.903, Brier scores ranging from 0.056 to 0.083, and calibration slopes ranging from 0.755 to 1.217. The multimodal models achieved precision ranging from 0.909 to 0.933 and recall ranging from 0.979 to 0.994, leading to F1-scores between 0.943 and 0.962. Important tabular predictors included patient age, BMI, hemoglobin level, white blood cell count, platelet count, and a combined anterior/posterior spinal fusion approach. Important free-text inputs included vertebral osteomyelitis, radiculopathy, myelopathy, and spinal metastasis.Conclusions: The multimodal NLP model exhibited superior performance in all outcome measures when compared to the baseline tabular model. Future work includes incorporating additional model dimensions, such as the history of present illness, physical exam, and spinal imaging, and clinically implementing the models into our informed consent and preoperative optimization pathway.","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.03.021","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background context: Machine learning (ML) algorithms can utilize the large amount of tabular data in electronic health records (EHRs) to predict perioperative safety indicators. Integrating unstructured free-text inputs via natural language processing (NLP) may further enhance predictive accuracy.

Purpose: To design and validate a preoperative multimodal ML architecture that integrates structured EHR data (patient demographics, comorbidities, and clinical covariates) with unstructured free-text inputs (past medical and surgical history, medications, and problem lists) via NLP. The multimodal models aim to improve the prediction of perioperative safety indicators compared to baseline ML models that only use structured tabular EHR data.

Study design: Retrospective cohort study.

Patient sample: 1,898 patients admitted for elective or emergency spine surgery at four separate large urban academic spine centers during a 5-year period from 2018 to 2023.

Outcome measures: Numerical outputs between 0 and 1 corresponding to the likelihood of (I) extended length of stay (LOS), (II) 90-day reoperation, and (III) perioperative intensive care unit (ICU) admission.

Methods: We predicted the following safety indicators (I) extended length of stay (LOS), (II) 90-day reoperation, and (III) perioperative intensive care unit (ICU) admission. The quanteda package for NLP within the R environment was utilized to preprocess free-text EHR inputs. The refined text was tokenized and transformed into numerical vectors using a bag-of-words approach and integrated with the tabular EHR data to create a document-feature matrix. Two extreme gradient boosted (XGBoost) ML models were trained: a base model utilizing only structured tabular EHR data and a combined multimodal model that leveraged both combined structured tabular EHR data with numerical vectors derived from free-text NLP inputs. Hyperparameter tuning was performed via grid search, and the models were validated using 10-fold cross validation with an 80:20 training/testing split. Word clouds were generated for the free-text data and explainable artificial intelligence (XAI) techniques were employed for feature importance. Metrics calculated for model performance included Area Under the Receiving-Operating Characteristic Curve (AUC-ROC), Brier score, Calibration slope, Calibration Intercept, Precision, Recall and F1-Score.

Results: 1,898 patients (60.7% female) were extracted from January 2018 to September 2023, with a median age of 60.0 (IQR: 52.0-68.0) and median body mass index (BMI) of 30.3 kgm² (IQR: 26.3-34.6). Extended LOS was defined as ≥ 14.4 days, constituting 10.1% of all individuals. The median LOS for the entire cohort was 4.0 days (IQR: 2.0-7.0), while the 90-day reoperation rate was 10.54%, and the ICU admission rate was 7.74%. The preoperative tabular EHR models predicted perioperative safety indicators with AUC ranging from 0.770 to 0.779, Brier scores ranging from 0.074 to 0.099, and calibration slopes ranging from 2.279 to 2.418. Precision and recall for this model ranged from 0.918 to 0.973 and 0.988 to 0.994, respectively, resulting in F1-scores between 0.954 and 0.973. The combined multimodal models predicted perioperative safety indicators with AUC ranging from 0.827 to 0.903, Brier scores ranging from 0.056 to 0.083, and calibration slopes ranging from 0.755 to 1.217. The multimodal models achieved precision ranging from 0.909 to 0.933 and recall ranging from 0.979 to 0.994, leading to F1-scores between 0.943 and 0.962. Important tabular predictors included patient age, BMI, hemoglobin level, white blood cell count, platelet count, and a combined anterior/posterior spinal fusion approach. Important free-text inputs included vertebral osteomyelitis, radiculopathy, myelopathy, and spinal metastasis.

Conclusions: The multimodal NLP model exhibited superior performance in all outcome measures when compared to the baseline tabular model. Future work includes incorporating additional model dimensions, such as the history of present illness, physical exam, and spinal imaging, and clinically implementing the models into our informed consent and preoperative optimization pathway.

查看原文本刊更多论文

多模态机器学习预测脊柱手术围手术期安全指标。

背景背景：机器学习（ML）算法可以利用电子健康记录（EHRs）中的大量表格数据来预测围手术期安全指标。通过自然语言处理（NLP）整合非结构化的自由文本输入可以进一步提高预测的准确性。目的：设计并验证一个术前多模式机器学习架构，该架构通过自然语言处理（NLP）将结构化EHR数据（患者人口统计、合并症和临床协变量）与非结构化自由文本输入（过去的医疗和手术史、药物和问题列表）集成在一起。与仅使用结构化表格式EHR数据的基线ML模型相比，多模态模型旨在提高围手术期安全指标的预测。研究设计：回顾性队列研究患者样本：2018-2023年五年间，在四个独立的大型城市学术脊柱中心接受选择性或紧急脊柱手术的1,898例患者。结果测量：数值输出在0到1之间，对应于(I)延长住院时间（LOS），（II） 90天再手术，（III）围手术期重症监护病房（ICU）入住的可能性。方法：预测以下安全指标：(1)延长住院时间（LOS），(2) 90天再手术，(3)围手术期重症监护病房（ICU）入住。使用R环境中的NLP量化包对自由文本EHR输入进行预处理。使用词袋方法对精炼文本进行标记并转换为数值向量，并与表格式EHR数据集成以创建文档特征矩阵。训练了两个极端梯度增强（XGBoost） ML模型：一个仅利用结构化表格式EHR数据的基础模型和一个结合了结构化表格式EHR数据和来自自由文本NLP输入的数值向量的组合多模态模型。通过网格搜索执行超参数调优，并使用10倍交叉验证和80:20的训练/测试分割来验证模型。对自由文本数据生成词云，并采用可解释的人工智能（XAI）技术对特征重要性进行分析。计算模型性能的指标包括接收-工作特征曲线下面积（AUC-ROC）、Brier评分、校准斜率、校准截距、精度、召回率和f1评分。结果：2018年1月至2023年9月共提取患者1898例（女性60.7%），中位年龄60.0岁（IQR: 52.0 ~ 68.0），中位体重指数（BMI） 30.3 kgm2 （IQR: 26.3 ~ 34.6）。延长的LOS定义为≥14.4天，占所有个体的10.1%。整个队列的中位LOS为4.0天（IQR: 2.0 - 7.0）， 90天再手术率为10.54%，ICU入院率为7.74%。术前表格式EHR模型预测围术期安全指标的AUC范围为0.770 ~ 0.779，Brier评分范围为0.074 ~ 0.099，校正斜率范围为2.279 ~ 2.418。该模型的精密度和召回率分别在0.918 ~ 0.973和0.988 ~ 0.994之间，f1得分在0.954 ~ 0.973之间。联合多模态模型预测围手术期安全指标AUC范围为0.827 ~ 0.903，Brier评分范围为0.056 ~ 0.083，校正斜率范围为0.755 ~ 1.217。多模态模型的准确率在0.909 ~ 0.933之间，召回率在0.979 ~ 0.994之间，f1得分在0.943 ~ 0.962之间。重要的预测指标包括患者年龄、BMI、血红蛋白水平、白细胞计数、血小板计数和脊柱前后融合联合入路。重要的自由文本输入包括椎体骨髓炎、神经根病、脊髓病和脊柱转移。结论：与基线表格模型相比，多模态NLP模型在所有结果测量中表现出优越的性能。未来的工作包括纳入其他模型维度，如病史、体格检查和脊柱成像，并在临床上将这些模型应用到我们的知情同意和术前优化途径中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Spine Journal 医学-临床神经学

CiteScore

8.20

自引率

6.70%

发文量

680

审稿时长

13.1 weeks

期刊介绍： The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.