[Early warning method for invasive mechanical ventilation in septic patients based on machine learning model].

Q3 Medicine

Zhonghua wei zhong bing ji jiu yi xue Pub Date : 2025-07-01 DOI:10.3760/cma.j.cn121430-20240422-00368

Wanjun Liu, Wenyan Xiao, Jin Zhang, Juanjuan Hu, Shanshan Huang, Yu Liu, Tianfeng Hua, Min Yang

{"title":"[Early warning method for invasive mechanical ventilation in septic patients based on machine learning model].","authors":"Wanjun Liu, Wenyan Xiao, Jin Zhang, Juanjuan Hu, Shanshan Huang, Yu Liu, Tianfeng Hua, Min Yang","doi":"10.3760/cma.j.cn121430-20240422-00368","DOIUrl":null,"url":null,"abstract":"Objective: To develop a method for identifying high-risk patients among septic populations requiring mechanical ventilation, and to conduct phenotypic analysis based on this method.Methods: Data from four sources were utilized: the Medical Information Mart for Intensive Care (MIMIC-IV 2.0, MIMIC-III 1.4), the Philips eICU-Collaborative Research Database 2.0 (eICU-CRD 2.0), and the Anhui Medical University Second Affiliated Hospital dataset. The adult patients in intensive care unit (ICU) who met Sepsis-3 and received invasive mechanical ventilation (IMV) on the first day of first admission were enrolled. The MIMIC-IV dataset with the highest data integrity was divided into a training set and a test set at a 6:1 ratio, while the remaining datasets were served as validation sets. The demographic information, comorbidities, laboratory indicators, commonly used ICU scores, and treatment measures of patients were extracted. Clinical data collected within first day of ICU admission were used to calculate the sequential organ failure assessment (SOFA) score. K-means clustering was applied to cluster SOFA score components, and the sum of squared errors (SSE) and Davies-Bouldin index (DBI) were used to determine the optimal number of disease subtypes. For clustering results, normalized methods were employed to compare baseline characteristics by visualization, and Kaplan-Meier curves were used to analyze clinical outcomes across phenotypes.Results: This study enrolled patients from MIMIC-IV dataset (n = 11 166), MIMIC-III dataset (n = 4 821), eICU-CRD dataset (n = 6 624), and a local dataset (n = 110), with the four datasets showing similar median ages and male proportions exceeding 50%; using 85% of the MIMIC-IV dataset as the training set, 15% as the test set, and the rest dataset as the validation set. K-means clustering based on the six-item SOFA score was performed to determine the optimal number of clusters as 3, and patients were finally classified into three phenotypes. In the training set, compared with the patients with phenotype II and phenotype III, those with phenotype I had the more severe circulatory and respiratory dysfunction, a higher proportion of vasoactive drug usage, more obvious metabolic acidosis and hypoxia, and a higher incidence of congestive heart failure. The patients with phenotype II was dominated by respiratory dysfunction with higher visceral injury. The patients with phenotype III had relatively stable organ function. The above characteristics were consistent in both the test and validation sets. Analysis of infection-related indicators showed that the patients with phenotype I had the highest SOFA score within 7 days after ICU admission, initial decreases and later increases in platelet count (PLT), and higher counts of neutrophils, lymphocytes, and monocytes as compared with those with phenotype II and phenotype III, their blood cultures had a higher positivity rates for Gram-positive bacteria, Gram-negative bacteria and fungi as compared with those with phenotype II and phenotype III. The Kaplan-Meier curve indicated that in the training, test, and validation sets, the 28-day cumulative mortality of patients with phenotype I was significantly higher than that of patients with phenotypes II and phenotype III.Conclusions: Three distinct phenotypes in septic patients receiving IMV based on unsupervised machine learning is derived, among which phenotype I, characterized by cardiorespiratory failure, can be used for the early identification of high-risk patients in this population. Moreover, this population is more prone to bloodstream infections, posing a high risk and having a poor prognosis.","PeriodicalId":24079,"journal":{"name":"Zhonghua wei zhong bing ji jiu yi xue","volume":"37 7","pages":"644-650"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zhonghua wei zhong bing ji jiu yi xue","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3760/cma.j.cn121430-20240422-00368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To develop a method for identifying high-risk patients among septic populations requiring mechanical ventilation, and to conduct phenotypic analysis based on this method.

Methods: Data from four sources were utilized: the Medical Information Mart for Intensive Care (MIMIC-IV 2.0, MIMIC-III 1.4), the Philips eICU-Collaborative Research Database 2.0 (eICU-CRD 2.0), and the Anhui Medical University Second Affiliated Hospital dataset. The adult patients in intensive care unit (ICU) who met Sepsis-3 and received invasive mechanical ventilation (IMV) on the first day of first admission were enrolled. The MIMIC-IV dataset with the highest data integrity was divided into a training set and a test set at a 6:1 ratio, while the remaining datasets were served as validation sets. The demographic information, comorbidities, laboratory indicators, commonly used ICU scores, and treatment measures of patients were extracted. Clinical data collected within first day of ICU admission were used to calculate the sequential organ failure assessment (SOFA) score. K-means clustering was applied to cluster SOFA score components, and the sum of squared errors (SSE) and Davies-Bouldin index (DBI) were used to determine the optimal number of disease subtypes. For clustering results, normalized methods were employed to compare baseline characteristics by visualization, and Kaplan-Meier curves were used to analyze clinical outcomes across phenotypes.

Results: This study enrolled patients from MIMIC-IV dataset (n = 11 166), MIMIC-III dataset (n = 4 821), eICU-CRD dataset (n = 6 624), and a local dataset (n = 110), with the four datasets showing similar median ages and male proportions exceeding 50%; using 85% of the MIMIC-IV dataset as the training set, 15% as the test set, and the rest dataset as the validation set. K-means clustering based on the six-item SOFA score was performed to determine the optimal number of clusters as 3, and patients were finally classified into three phenotypes. In the training set, compared with the patients with phenotype II and phenotype III, those with phenotype I had the more severe circulatory and respiratory dysfunction, a higher proportion of vasoactive drug usage, more obvious metabolic acidosis and hypoxia, and a higher incidence of congestive heart failure. The patients with phenotype II was dominated by respiratory dysfunction with higher visceral injury. The patients with phenotype III had relatively stable organ function. The above characteristics were consistent in both the test and validation sets. Analysis of infection-related indicators showed that the patients with phenotype I had the highest SOFA score within 7 days after ICU admission, initial decreases and later increases in platelet count (PLT), and higher counts of neutrophils, lymphocytes, and monocytes as compared with those with phenotype II and phenotype III, their blood cultures had a higher positivity rates for Gram-positive bacteria, Gram-negative bacteria and fungi as compared with those with phenotype II and phenotype III. The Kaplan-Meier curve indicated that in the training, test, and validation sets, the 28-day cumulative mortality of patients with phenotype I was significantly higher than that of patients with phenotypes II and phenotype III.

Conclusions: Three distinct phenotypes in septic patients receiving IMV based on unsupervised machine learning is derived, among which phenotype I, characterized by cardiorespiratory failure, can be used for the early identification of high-risk patients in this population. Moreover, this population is more prone to bloodstream infections, posing a high risk and having a poor prognosis.

查看原文本刊更多论文

基于机器学习模型的脓毒症患者有创机械通气早期预警方法

目的：建立一种在需要机械通气的脓毒症人群中识别高危患者的方法，并以此为基础进行表型分析。方法：使用四个来源的数据：重症监护医学信息市场（MIMIC-IV 2.0, MIMIC-III 1.4），飞利浦eicu合作研究数据库2.0 （eICU-CRD 2.0）和安徽医科大学第二附属医院数据集。纳入重症监护病房（ICU）首次入院第一天脓毒症-3级且接受有创机械通气（IMV）治疗的成年患者。将数据完整性最高的MIMIC-IV数据集按6:1的比例分为训练集和测试集，其余数据集作为验证集。提取患者的人口学信息、合并症、实验室指标、常用ICU评分、治疗措施等。在ICU入院第一天内收集的临床数据用于计算顺序器官衰竭评估（SOFA）评分。采用K-means聚类对SOFA评分成分进行聚类，采用平方误差和（SSE）和Davies-Bouldin指数（DBI）确定最优疾病亚型数。对于聚类结果，采用标准化方法通过可视化比较基线特征，并使用Kaplan-Meier曲线分析不同表型的临床结果。结果：本研究纳入了来自MIMIC-IV数据集（n = 11 166）、MIMIC-III数据集（n = 4 821）、eICU-CRD数据集（n = 6 624）和本地数据集（n = 110）的患者，4个数据集的中位年龄相似，男性比例超过50%；使用85%的MIMIC-IV数据集作为训练集，15%作为测试集，其余数据集作为验证集。基于六项SOFA评分进行K-means聚类，确定最佳聚类数为3，最终将患者分为三种表型。在训练集中，与表型II和表型III患者相比，表型I患者循环和呼吸功能障碍更严重，血管活性药物使用比例更高，代谢性酸中毒和缺氧更明显，充血性心力衰竭发生率更高。II型患者以呼吸功能障碍为主，并伴有较高的内脏损伤。表型III型患者器官功能相对稳定。上述特征在测试集和验证集中都是一致的。感染相关指标分析显示，I型患者入院后7天内SOFA评分最高，血小板计数（PLT）先降后升，中性粒细胞、淋巴细胞、单核细胞计数均高于II型和III型患者，其血培养革兰氏阳性菌阳性率较高；革兰氏阴性细菌和真菌与表型II和表型III的比较。Kaplan-Meier曲线显示，在训练集、检验集和验证集中，表现型I患者的28天累积死亡率显著高于表现型II和表现型III患者。结论：基于无监督机器学习导出了感染性IMV患者的三种不同表型，其中以心肺衰竭为特征的表型I可用于该人群中高危患者的早期识别。此外，这一人群更容易发生血液感染，风险高，预后差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Zhonghua wei zhong bing ji jiu yi xue Medicine-Critical Care and Intensive Care Medicine

CiteScore

1.00

自引率

0.00%

发文量