Kyle Mani, Thomas Scharfenberger, Samuel N Goldman, Emily Kleinbart, Evan Mostafa, Rafael De La Garza Ramos, Mitchell S Fourman, Ananth Eleswarapu
{"title":"Multimodal Machine Learning for Predicting Perioperative Safety Indicators in Spinal Surgery.","authors":"Kyle Mani, Thomas Scharfenberger, Samuel N Goldman, Emily Kleinbart, Evan Mostafa, Rafael De La Garza Ramos, Mitchell S Fourman, Ananth Eleswarapu","doi":"10.1016/j.spinee.2025.03.021","DOIUrl":null,"url":null,"abstract":"<p><strong>Background context: </strong>Machine learning (ML) algorithms can utilize the large amount of tabular data in electronic health records (EHRs) to predict peri-operative safety indicators. Integrating unstructured free-text inputs via natural language processing (NLP) may further enhance predictive accuracy.</p><p><strong>Purpose: </strong>To design and validate a pre-operative multi-modal machine learning architecture that integrates structured EHR data (patient demographics, comorbidities, and clinical covariates) with unstructured free-text inputs (past medical and surgical history, medications, and problem lists) via natural language processing (NLP). The multi-modal models aim to improve the prediction of peri-operative safety indicators compared to baseline ML models that only use structured tabular EHR data.</p><p><strong>Study design: </strong>Retrospective cohort study PATIENT SAMPLE: 1,898 patients admitted for elective or emergency spine surgery at four separate large urban academic spine centers during a five-year period from 2018-2023.</p><p><strong>Outcome measures: </strong>Numerical outputs between 0 to 1 corresponding to the likelihood of (I) extended length of stay (LOS), (II) 90-day reoperation, and (III) peri-operative intensive care unit (ICU) admission.</p><p><strong>Methods: </strong>We predicted the following safety indicators (I) extended length of stay (LOS), II (90-day reoperation, and (III) peri-operative intensive care unit (ICU) admission. The quanteda package for NLP within the R environment was utilized to preprocess free-text EHR inputs. The refined text was tokenized and transformed into numerical vectors using a bag-of-words approach and integrated with the tabular EHR data to create a document-feature matrix. Two extreme gradient boosted (XGBoost) ML models were trained: a base model utilizing only structured tabular EHR data and a combined multi-modal model that leveraged both combined structured tabular EHR data with numerical vectors derived from free-text NLP inputs. Hyperparameter tuning was performed via grid search, and the models were validated using 10-fold cross validation with an 80:20 training/testing split. Word clouds were generated for the free-text data and explainable artificial intelligence (XAI) techniques were employed for feature importance. Metrics calculated for model performance included Area Under the Receiving-Operating Characteristic Curve (AUC-ROC), Brier score, Calibration slope, Calibration Intercept, Precision, Recall and F1-Score.</p><p><strong>Results: </strong>1,898 patients (60.7% female) were extracted from January 2018 to September 2023, with a median age of 60.0 (IQR: 52.0 - 68.0) and median body mass index (BMI) of 30.3 kgm<sup>2</sup> (IQR: 26.3 - 34.6). Extended LOS was defined as ≥ 14.4 days, constituting 10.1% of all individuals. The median LOS for the entire cohort was 4.0 days (IQR: 2.0 - 7.0), while the 90-day reoperation rate was 10.54%, and the ICU admission rate was 7.74%. The pre-operative tabular EHR models predicted peri-operative safety indicators with AUC ranging from 0.770 to 0.779, Brier scores ranging from 0.074 to 0.099, and calibration slopes ranging from 2.279 to 2.418. Precision and recall for this model ranged from 0.918 to 0.973 and 0.988 to 0.994, respectively, resulting in F1-scores between 0.954 and 0.973. The combined multi-modal models predicted peri-operative safety indicators with AUC ranging from 0.827 to 0.903, Brier scores ranging from 0.056 to 0.083, and calibration slopes ranging from 0.755 to 1.217. The multi-modal models achieved precision ranging from 0.909 to 0.933 and recall ranging from 0.979 to 0.994, leading to F1-scores between 0.943 and 0.962. Important tabular predictors included patient age, BMI, hemoglobin level, white blood cell count, platelet count, and a combined anterior/posterior spinal fusion approach. Important free-text inputs included vertebral osteomyelitis, radiculopathy, myelopathy, and spinal metastasis.</p><p><strong>Conclusions: </strong>The multi-modal NLP model exhibited superior performance in all outcome measures when compared to the baseline tabular model. Future work includes incorporating additional model dimensions, such as the history of present illness, physical exam, and spinal imaging, and clinically implementing the models into our informed consent and pre-operative optimization pathway.</p>","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.03.021","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background context: Machine learning (ML) algorithms can utilize the large amount of tabular data in electronic health records (EHRs) to predict peri-operative safety indicators. Integrating unstructured free-text inputs via natural language processing (NLP) may further enhance predictive accuracy.
Purpose: To design and validate a pre-operative multi-modal machine learning architecture that integrates structured EHR data (patient demographics, comorbidities, and clinical covariates) with unstructured free-text inputs (past medical and surgical history, medications, and problem lists) via natural language processing (NLP). The multi-modal models aim to improve the prediction of peri-operative safety indicators compared to baseline ML models that only use structured tabular EHR data.
Study design: Retrospective cohort study PATIENT SAMPLE: 1,898 patients admitted for elective or emergency spine surgery at four separate large urban academic spine centers during a five-year period from 2018-2023.
Outcome measures: Numerical outputs between 0 to 1 corresponding to the likelihood of (I) extended length of stay (LOS), (II) 90-day reoperation, and (III) peri-operative intensive care unit (ICU) admission.
Methods: We predicted the following safety indicators (I) extended length of stay (LOS), II (90-day reoperation, and (III) peri-operative intensive care unit (ICU) admission. The quanteda package for NLP within the R environment was utilized to preprocess free-text EHR inputs. The refined text was tokenized and transformed into numerical vectors using a bag-of-words approach and integrated with the tabular EHR data to create a document-feature matrix. Two extreme gradient boosted (XGBoost) ML models were trained: a base model utilizing only structured tabular EHR data and a combined multi-modal model that leveraged both combined structured tabular EHR data with numerical vectors derived from free-text NLP inputs. Hyperparameter tuning was performed via grid search, and the models were validated using 10-fold cross validation with an 80:20 training/testing split. Word clouds were generated for the free-text data and explainable artificial intelligence (XAI) techniques were employed for feature importance. Metrics calculated for model performance included Area Under the Receiving-Operating Characteristic Curve (AUC-ROC), Brier score, Calibration slope, Calibration Intercept, Precision, Recall and F1-Score.
Results: 1,898 patients (60.7% female) were extracted from January 2018 to September 2023, with a median age of 60.0 (IQR: 52.0 - 68.0) and median body mass index (BMI) of 30.3 kgm2 (IQR: 26.3 - 34.6). Extended LOS was defined as ≥ 14.4 days, constituting 10.1% of all individuals. The median LOS for the entire cohort was 4.0 days (IQR: 2.0 - 7.0), while the 90-day reoperation rate was 10.54%, and the ICU admission rate was 7.74%. The pre-operative tabular EHR models predicted peri-operative safety indicators with AUC ranging from 0.770 to 0.779, Brier scores ranging from 0.074 to 0.099, and calibration slopes ranging from 2.279 to 2.418. Precision and recall for this model ranged from 0.918 to 0.973 and 0.988 to 0.994, respectively, resulting in F1-scores between 0.954 and 0.973. The combined multi-modal models predicted peri-operative safety indicators with AUC ranging from 0.827 to 0.903, Brier scores ranging from 0.056 to 0.083, and calibration slopes ranging from 0.755 to 1.217. The multi-modal models achieved precision ranging from 0.909 to 0.933 and recall ranging from 0.979 to 0.994, leading to F1-scores between 0.943 and 0.962. Important tabular predictors included patient age, BMI, hemoglobin level, white blood cell count, platelet count, and a combined anterior/posterior spinal fusion approach. Important free-text inputs included vertebral osteomyelitis, radiculopathy, myelopathy, and spinal metastasis.
Conclusions: The multi-modal NLP model exhibited superior performance in all outcome measures when compared to the baseline tabular model. Future work includes incorporating additional model dimensions, such as the history of present illness, physical exam, and spinal imaging, and clinically implementing the models into our informed consent and pre-operative optimization pathway.
期刊介绍:
The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.