Wanjun Liu, Wenyan Xiao, Jin Zhang, Juanjuan Hu, Shanshan Huang, Yu Liu, Tianfeng Hua, Min Yang
{"title":"[Early warning method for invasive mechanical ventilation in septic patients based on machine learning model].","authors":"Wanjun Liu, Wenyan Xiao, Jin Zhang, Juanjuan Hu, Shanshan Huang, Yu Liu, Tianfeng Hua, Min Yang","doi":"10.3760/cma.j.cn121430-20240422-00368","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop a method for identifying high-risk patients among septic populations requiring mechanical ventilation, and to conduct phenotypic analysis based on this method.</p><p><strong>Methods: </strong>Data from four sources were utilized: the Medical Information Mart for Intensive Care (MIMIC-IV 2.0, MIMIC-III 1.4), the Philips eICU-Collaborative Research Database 2.0 (eICU-CRD 2.0), and the Anhui Medical University Second Affiliated Hospital dataset. The adult patients in intensive care unit (ICU) who met Sepsis-3 and received invasive mechanical ventilation (IMV) on the first day of first admission were enrolled. The MIMIC-IV dataset with the highest data integrity was divided into a training set and a test set at a 6:1 ratio, while the remaining datasets were served as validation sets. The demographic information, comorbidities, laboratory indicators, commonly used ICU scores, and treatment measures of patients were extracted. Clinical data collected within first day of ICU admission were used to calculate the sequential organ failure assessment (SOFA) score. K-means clustering was applied to cluster SOFA score components, and the sum of squared errors (SSE) and Davies-Bouldin index (DBI) were used to determine the optimal number of disease subtypes. For clustering results, normalized methods were employed to compare baseline characteristics by visualization, and Kaplan-Meier curves were used to analyze clinical outcomes across phenotypes.</p><p><strong>Results: </strong>This study enrolled patients from MIMIC-IV dataset (n = 11 166), MIMIC-III dataset (n = 4 821), eICU-CRD dataset (n = 6 624), and a local dataset (n = 110), with the four datasets showing similar median ages and male proportions exceeding 50%; using 85% of the MIMIC-IV dataset as the training set, 15% as the test set, and the rest dataset as the validation set. K-means clustering based on the six-item SOFA score was performed to determine the optimal number of clusters as 3, and patients were finally classified into three phenotypes. In the training set, compared with the patients with phenotype II and phenotype III, those with phenotype I had the more severe circulatory and respiratory dysfunction, a higher proportion of vasoactive drug usage, more obvious metabolic acidosis and hypoxia, and a higher incidence of congestive heart failure. The patients with phenotype II was dominated by respiratory dysfunction with higher visceral injury. The patients with phenotype III had relatively stable organ function. The above characteristics were consistent in both the test and validation sets. Analysis of infection-related indicators showed that the patients with phenotype I had the highest SOFA score within 7 days after ICU admission, initial decreases and later increases in platelet count (PLT), and higher counts of neutrophils, lymphocytes, and monocytes as compared with those with phenotype II and phenotype III, their blood cultures had a higher positivity rates for Gram-positive bacteria, Gram-negative bacteria and fungi as compared with those with phenotype II and phenotype III. The Kaplan-Meier curve indicated that in the training, test, and validation sets, the 28-day cumulative mortality of patients with phenotype I was significantly higher than that of patients with phenotypes II and phenotype III.</p><p><strong>Conclusions: </strong>Three distinct phenotypes in septic patients receiving IMV based on unsupervised machine learning is derived, among which phenotype I, characterized by cardiorespiratory failure, can be used for the early identification of high-risk patients in this population. Moreover, this population is more prone to bloodstream infections, posing a high risk and having a poor prognosis.</p>","PeriodicalId":24079,"journal":{"name":"Zhonghua wei zhong bing ji jiu yi xue","volume":"37 7","pages":"644-650"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zhonghua wei zhong bing ji jiu yi xue","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3760/cma.j.cn121430-20240422-00368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To develop a method for identifying high-risk patients among septic populations requiring mechanical ventilation, and to conduct phenotypic analysis based on this method.
Methods: Data from four sources were utilized: the Medical Information Mart for Intensive Care (MIMIC-IV 2.0, MIMIC-III 1.4), the Philips eICU-Collaborative Research Database 2.0 (eICU-CRD 2.0), and the Anhui Medical University Second Affiliated Hospital dataset. The adult patients in intensive care unit (ICU) who met Sepsis-3 and received invasive mechanical ventilation (IMV) on the first day of first admission were enrolled. The MIMIC-IV dataset with the highest data integrity was divided into a training set and a test set at a 6:1 ratio, while the remaining datasets were served as validation sets. The demographic information, comorbidities, laboratory indicators, commonly used ICU scores, and treatment measures of patients were extracted. Clinical data collected within first day of ICU admission were used to calculate the sequential organ failure assessment (SOFA) score. K-means clustering was applied to cluster SOFA score components, and the sum of squared errors (SSE) and Davies-Bouldin index (DBI) were used to determine the optimal number of disease subtypes. For clustering results, normalized methods were employed to compare baseline characteristics by visualization, and Kaplan-Meier curves were used to analyze clinical outcomes across phenotypes.
Results: This study enrolled patients from MIMIC-IV dataset (n = 11 166), MIMIC-III dataset (n = 4 821), eICU-CRD dataset (n = 6 624), and a local dataset (n = 110), with the four datasets showing similar median ages and male proportions exceeding 50%; using 85% of the MIMIC-IV dataset as the training set, 15% as the test set, and the rest dataset as the validation set. K-means clustering based on the six-item SOFA score was performed to determine the optimal number of clusters as 3, and patients were finally classified into three phenotypes. In the training set, compared with the patients with phenotype II and phenotype III, those with phenotype I had the more severe circulatory and respiratory dysfunction, a higher proportion of vasoactive drug usage, more obvious metabolic acidosis and hypoxia, and a higher incidence of congestive heart failure. The patients with phenotype II was dominated by respiratory dysfunction with higher visceral injury. The patients with phenotype III had relatively stable organ function. The above characteristics were consistent in both the test and validation sets. Analysis of infection-related indicators showed that the patients with phenotype I had the highest SOFA score within 7 days after ICU admission, initial decreases and later increases in platelet count (PLT), and higher counts of neutrophils, lymphocytes, and monocytes as compared with those with phenotype II and phenotype III, their blood cultures had a higher positivity rates for Gram-positive bacteria, Gram-negative bacteria and fungi as compared with those with phenotype II and phenotype III. The Kaplan-Meier curve indicated that in the training, test, and validation sets, the 28-day cumulative mortality of patients with phenotype I was significantly higher than that of patients with phenotypes II and phenotype III.
Conclusions: Three distinct phenotypes in septic patients receiving IMV based on unsupervised machine learning is derived, among which phenotype I, characterized by cardiorespiratory failure, can be used for the early identification of high-risk patients in this population. Moreover, this population is more prone to bloodstream infections, posing a high risk and having a poor prognosis.