[基于机器学习的急性中风后一年癫痫预测模型的构建和外部验证]。

Q3 Medicine

Zhonghua wei zhong bing ji jiu yi xue Pub Date : 2025-05-01 DOI:10.3760/cma.j.cn121430-20241225-01069

Wenkao Zhou, Fangli Zhao, Xingqiang Qiu, Yujuan Yang, Tingting Wang, Lingyan Huang

{"title":"[基于机器学习的急性中风后一年癫痫预测模型的构建和外部验证]。","authors":"Wenkao Zhou, Fangli Zhao, Xingqiang Qiu, Yujuan Yang, Tingting Wang, Lingyan Huang","doi":"10.3760/cma.j.cn121430-20241225-01069","DOIUrl":null,"url":null,"abstract":"Objective: To identify the optimal machine learning algorithm for predicting post-stroke epilepsy (PSE) within one year following acute stroke, establish a nomogram model based on this algorithm, and perform external validation to achieve accurate prediction of secondary epilepsy.Methods: A total of 870 acute stroke patients admitted to the emergency department of Xiang'an Hospital of Xiamen University from June 2019 to June 2023 were enrolled for model development (model group). An external validation cohort of 435 acute stroke patients admitted to the Fifth Hospital of Xiamen during the same period was used to validate the machine learning algorithms and nomogram model. Patients were classified into control and epilepsy groups based on the development of PSE within one year. Clinical and laboratory data, including baseline characteristics, stroke location, vascular status, complications, hematologic parameters, and National Institutes of Health Stroke Scale (NIHSS) score, were collected for analysis. Nine machine learning algorithms such as logistic regression, CN2 rule induction, K-nearest neighbors, adaptive boosting, random forest, gradient boosting, support vector machine, naive Bayes, and neural network were applied to evaluate predictive performance. The area under the curve (AUC) of receiver operator characteristic curve (ROC curve) was used to identify the optimal algorithm. Logistic regression was used to screen risk factors for PSE, and the top 10 predictors were selected to construct the nomogram model. The predictive performance of the model was evaluated using the ROC curve in both the model and validation groups.Results: Among the 870 patients in the model group, 29 developed PSE within one year. Among the nine algorithms tested, logistic regression demonstrated the best performance and generalizability, with an AUC of 0.923. Univariate logistic regression identified several risk factors for PSE, including platelet count, white blood cell count, red blood cell count, glycated hemoglobin (HbA1c), C-reactive protein (CRP), triglycerides, high-density lipoprotein (HDL), aspartate aminotransferase (AST), alanine aminotransferase (ALT), activated partial thromboplastin time (APTT), thrombin time, D-dimer, fibrinogen, creatine kinase (CK), creatine kinase-MB (CK-MB), lactate dehydrogenase (LDH), serum sodium, lactic acid, anion gap, NIHSS score, brain herniation, periventricular stroke, and carotid artery plaque. Further multivariate logistic regression analysis showed that white blood cell count, HDL, fibrinogen, lactic acid and brain herniation were independent risk factors [odds ratio (OR) were 1.837, 198.039, 47.025, 11.559, 70.722, respectively, all P < 0.05]. In the external validation group, univariate logistic regression analysis showed that platelet count, white blood cell count, CRP, triacylglycerol, APTT, D-dimer, fibrinogen, CK, CK-MB, LDH, NIHSS score, and cerebral herniation were risk factors for PSE one year after acute stroke. Further multiple logistic regression analysis showed that APTT and cerebral herniation were independent predictors (OR were 0.587 and 116.193, respectively, both P < 0.05). The nomogram model, constructed using 10 key variables-brain herniation, periventricular stroke, carotid artery plaque, white blood cell count, triglycerides, thrombin time, D-dimer, serum sodium, lactic acid, and NIHSS score-achieved an AUC of 0.908 in the model group and 0.864 in the external validation group.Conclusions: The logistic regression-based prediction model for epilepsy one year after acute stroke, developed using machine learning algorithms, showed optimal predictive performance. The nomogram model based on the logistic regression-derived predictors showed strong discriminative power and was successfully validated externally, suggesting favorable clinical applicability and generalizability.","PeriodicalId":24079,"journal":{"name":"Zhonghua wei zhong bing ji jiu yi xue","volume":"37 5","pages":"445-451"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Construction and external validation of a machine learning-based prediction model for epilepsy one year after acute stroke].\",\"authors\":\"Wenkao Zhou, Fangli Zhao, Xingqiang Qiu, Yujuan Yang, Tingting Wang, Lingyan Huang\",\"doi\":\"10.3760/cma.j.cn121430-20241225-01069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To identify the optimal machine learning algorithm for predicting post-stroke epilepsy (PSE) within one year following acute stroke, establish a nomogram model based on this algorithm, and perform external validation to achieve accurate prediction of secondary epilepsy.Methods: A total of 870 acute stroke patients admitted to the emergency department of Xiang'an Hospital of Xiamen University from June 2019 to June 2023 were enrolled for model development (model group). An external validation cohort of 435 acute stroke patients admitted to the Fifth Hospital of Xiamen during the same period was used to validate the machine learning algorithms and nomogram model. Patients were classified into control and epilepsy groups based on the development of PSE within one year. Clinical and laboratory data, including baseline characteristics, stroke location, vascular status, complications, hematologic parameters, and National Institutes of Health Stroke Scale (NIHSS) score, were collected for analysis. Nine machine learning algorithms such as logistic regression, CN2 rule induction, K-nearest neighbors, adaptive boosting, random forest, gradient boosting, support vector machine, naive Bayes, and neural network were applied to evaluate predictive performance. The area under the curve (AUC) of receiver operator characteristic curve (ROC curve) was used to identify the optimal algorithm. Logistic regression was used to screen risk factors for PSE, and the top 10 predictors were selected to construct the nomogram model. The predictive performance of the model was evaluated using the ROC curve in both the model and validation groups.Results: Among the 870 patients in the model group, 29 developed PSE within one year. Among the nine algorithms tested, logistic regression demonstrated the best performance and generalizability, with an AUC of 0.923. Univariate logistic regression identified several risk factors for PSE, including platelet count, white blood cell count, red blood cell count, glycated hemoglobin (HbA1c), C-reactive protein (CRP), triglycerides, high-density lipoprotein (HDL), aspartate aminotransferase (AST), alanine aminotransferase (ALT), activated partial thromboplastin time (APTT), thrombin time, D-dimer, fibrinogen, creatine kinase (CK), creatine kinase-MB (CK-MB), lactate dehydrogenase (LDH), serum sodium, lactic acid, anion gap, NIHSS score, brain herniation, periventricular stroke, and carotid artery plaque. Further multivariate logistic regression analysis showed that white blood cell count, HDL, fibrinogen, lactic acid and brain herniation were independent risk factors [odds ratio (OR) were 1.837, 198.039, 47.025, 11.559, 70.722, respectively, all P < 0.05]. In the external validation group, univariate logistic regression analysis showed that platelet count, white blood cell count, CRP, triacylglycerol, APTT, D-dimer, fibrinogen, CK, CK-MB, LDH, NIHSS score, and cerebral herniation were risk factors for PSE one year after acute stroke. Further multiple logistic regression analysis showed that APTT and cerebral herniation were independent predictors (OR were 0.587 and 116.193, respectively, both P < 0.05). The nomogram model, constructed using 10 key variables-brain herniation, periventricular stroke, carotid artery plaque, white blood cell count, triglycerides, thrombin time, D-dimer, serum sodium, lactic acid, and NIHSS score-achieved an AUC of 0.908 in the model group and 0.864 in the external validation group.Conclusions: The logistic regression-based prediction model for epilepsy one year after acute stroke, developed using machine learning algorithms, showed optimal predictive performance. The nomogram model based on the logistic regression-derived predictors showed strong discriminative power and was successfully validated externally, suggesting favorable clinical applicability and generalizability.\",\"PeriodicalId\":24079,\"journal\":{\"name\":\"Zhonghua wei zhong bing ji jiu yi xue\",\"volume\":\"37 5\",\"pages\":\"445-451\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Zhonghua wei zhong bing ji jiu yi xue\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3760/cma.j.cn121430-20241225-01069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zhonghua wei zhong bing ji jiu yi xue","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3760/cma.j.cn121430-20241225-01069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

摘要

目的：确定预测急性脑卒中后一年内癫痫（PSE）的最佳机器学习算法，基于该算法建立nomogram模型，并进行外部验证，实现对继发性癫痫的准确预测。方法：选取2019年6月至2023年6月厦门大学翔安医院急诊科收治的急性脑卒中患者870例进行模型制作（模型组）。对同期厦门市第五医院收治的435例急性脑卒中患者进行外部验证队列，验证机器学习算法和nomogram模型。根据一年内PSE的发展情况将患者分为对照组和癫痫组。收集临床和实验室数据，包括基线特征、卒中位置、血管状况、并发症、血液学参数和美国国立卫生研究院卒中量表（NIHSS）评分进行分析。采用逻辑回归、CN2规则归纳、k近邻、自适应增强、随机森林、梯度增强、支持向量机、朴素贝叶斯和神经网络等9种机器学习算法来评估预测性能。采用接收算子特征曲线（ROC曲线）的曲线下面积（AUC）来确定最优算法。采用Logistic回归筛选PSE的危险因素，选取预测因素排名前10位的因素构建nomogram模型。在模型组和验证组中，采用ROC曲线评估模型的预测性能。结果：模型组870例患者中，有29例在1年内发生PSE。在9种算法中，逻辑回归算法表现出最好的性能和泛化性，AUC为0.923。单变量逻辑回归确定几个PSE的风险因素,包括血小板、白细胞计数、红细胞计数、糖化血红蛋白(HbA1c), c反应蛋白(CRP)、甘油三脂、高密度脂蛋白(HDL)、天冬氨酸转氨酶(AST)、丙氨酸转氨酶(ALT)、局部血栓形成质激活时间(APTT)、凝血酶时间、肺动脉栓塞,纤维蛋白原、肌酸激酶(CK)、肌酸kinase-MB(水平)、乳酸脱氢酶(LDH)、血清钠、乳酸,阴离子间隙、NIHSS评分、脑疝、脑室周围卒中和颈动脉斑块。进一步多因素logistic回归分析显示，白细胞计数、HDL、纤维蛋白原、乳酸、脑疝是独立危险因素[比值比（OR）分别为1.837、198.039、47.025、11.559、70.722，P均< 0.05]。外部验证组单因素logistic回归分析显示，血小板计数、白细胞计数、CRP、三酰甘油、APTT、d -二聚体、纤维蛋白原、CK、CK- mb、LDH、NIHSS评分、脑疝是急性脑卒中后1年发生PSE的危险因素。进一步的多元logistic回归分析显示，APTT和脑疝是独立预测因子（OR分别为0.587和116.193，P均< 0.05）。采用脑疝、室周卒中、颈动脉斑块、白细胞计数、甘油三酸酯、凝血酶时间、d -二聚体、血清钠、乳酸、NIHSS评分等10个关键变量构建nomogram模型，模型组的AUC为0.908，外部验证组的AUC为0.864。结论：采用机器学习算法建立的基于logistic回归的急性卒中后1年癫痫预测模型具有最佳的预测性能。基于logistic回归预测因子的nomogram模型具有较强的判别能力，并成功通过外部验证，具有较好的临床适用性和推广能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

[Construction and external validation of a machine learning-based prediction model for epilepsy one year after acute stroke].

Objective: To identify the optimal machine learning algorithm for predicting post-stroke epilepsy (PSE) within one year following acute stroke, establish a nomogram model based on this algorithm, and perform external validation to achieve accurate prediction of secondary epilepsy.

Methods: A total of 870 acute stroke patients admitted to the emergency department of Xiang'an Hospital of Xiamen University from June 2019 to June 2023 were enrolled for model development (model group). An external validation cohort of 435 acute stroke patients admitted to the Fifth Hospital of Xiamen during the same period was used to validate the machine learning algorithms and nomogram model. Patients were classified into control and epilepsy groups based on the development of PSE within one year. Clinical and laboratory data, including baseline characteristics, stroke location, vascular status, complications, hematologic parameters, and National Institutes of Health Stroke Scale (NIHSS) score, were collected for analysis. Nine machine learning algorithms such as logistic regression, CN2 rule induction, K-nearest neighbors, adaptive boosting, random forest, gradient boosting, support vector machine, naive Bayes, and neural network were applied to evaluate predictive performance. The area under the curve (AUC) of receiver operator characteristic curve (ROC curve) was used to identify the optimal algorithm. Logistic regression was used to screen risk factors for PSE, and the top 10 predictors were selected to construct the nomogram model. The predictive performance of the model was evaluated using the ROC curve in both the model and validation groups.

Results: Among the 870 patients in the model group, 29 developed PSE within one year. Among the nine algorithms tested, logistic regression demonstrated the best performance and generalizability, with an AUC of 0.923. Univariate logistic regression identified several risk factors for PSE, including platelet count, white blood cell count, red blood cell count, glycated hemoglobin (HbA1c), C-reactive protein (CRP), triglycerides, high-density lipoprotein (HDL), aspartate aminotransferase (AST), alanine aminotransferase (ALT), activated partial thromboplastin time (APTT), thrombin time, D-dimer, fibrinogen, creatine kinase (CK), creatine kinase-MB (CK-MB), lactate dehydrogenase (LDH), serum sodium, lactic acid, anion gap, NIHSS score, brain herniation, periventricular stroke, and carotid artery plaque. Further multivariate logistic regression analysis showed that white blood cell count, HDL, fibrinogen, lactic acid and brain herniation were independent risk factors [odds ratio (OR) were 1.837, 198.039, 47.025, 11.559, 70.722, respectively, all P < 0.05]. In the external validation group, univariate logistic regression analysis showed that platelet count, white blood cell count, CRP, triacylglycerol, APTT, D-dimer, fibrinogen, CK, CK-MB, LDH, NIHSS score, and cerebral herniation were risk factors for PSE one year after acute stroke. Further multiple logistic regression analysis showed that APTT and cerebral herniation were independent predictors (OR were 0.587 and 116.193, respectively, both P < 0.05). The nomogram model, constructed using 10 key variables-brain herniation, periventricular stroke, carotid artery plaque, white blood cell count, triglycerides, thrombin time, D-dimer, serum sodium, lactic acid, and NIHSS score-achieved an AUC of 0.908 in the model group and 0.864 in the external validation group.

Conclusions: The logistic regression-based prediction model for epilepsy one year after acute stroke, developed using machine learning algorithms, showed optimal predictive performance. The nomogram model based on the logistic regression-derived predictors showed strong discriminative power and was successfully validated externally, suggesting favorable clinical applicability and generalizability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Zhonghua wei zhong bing ji jiu yi xue Medicine-Critical Care and Intensive Care Medicine

CiteScore

1.00

自引率

0.00%

发文量