{"title":"[基于机器学习的急性中风后一年癫痫预测模型的构建和外部验证]。","authors":"Wenkao Zhou, Fangli Zhao, Xingqiang Qiu, Yujuan Yang, Tingting Wang, Lingyan Huang","doi":"10.3760/cma.j.cn121430-20241225-01069","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To identify the optimal machine learning algorithm for predicting post-stroke epilepsy (PSE) within one year following acute stroke, establish a nomogram model based on this algorithm, and perform external validation to achieve accurate prediction of secondary epilepsy.</p><p><strong>Methods: </strong>A total of 870 acute stroke patients admitted to the emergency department of Xiang'an Hospital of Xiamen University from June 2019 to June 2023 were enrolled for model development (model group). An external validation cohort of 435 acute stroke patients admitted to the Fifth Hospital of Xiamen during the same period was used to validate the machine learning algorithms and nomogram model. Patients were classified into control and epilepsy groups based on the development of PSE within one year. Clinical and laboratory data, including baseline characteristics, stroke location, vascular status, complications, hematologic parameters, and National Institutes of Health Stroke Scale (NIHSS) score, were collected for analysis. Nine machine learning algorithms such as logistic regression, CN2 rule induction, K-nearest neighbors, adaptive boosting, random forest, gradient boosting, support vector machine, naive Bayes, and neural network were applied to evaluate predictive performance. The area under the curve (AUC) of receiver operator characteristic curve (ROC curve) was used to identify the optimal algorithm. Logistic regression was used to screen risk factors for PSE, and the top 10 predictors were selected to construct the nomogram model. The predictive performance of the model was evaluated using the ROC curve in both the model and validation groups.</p><p><strong>Results: </strong>Among the 870 patients in the model group, 29 developed PSE within one year. Among the nine algorithms tested, logistic regression demonstrated the best performance and generalizability, with an AUC of 0.923. Univariate logistic regression identified several risk factors for PSE, including platelet count, white blood cell count, red blood cell count, glycated hemoglobin (HbA1c), C-reactive protein (CRP), triglycerides, high-density lipoprotein (HDL), aspartate aminotransferase (AST), alanine aminotransferase (ALT), activated partial thromboplastin time (APTT), thrombin time, D-dimer, fibrinogen, creatine kinase (CK), creatine kinase-MB (CK-MB), lactate dehydrogenase (LDH), serum sodium, lactic acid, anion gap, NIHSS score, brain herniation, periventricular stroke, and carotid artery plaque. Further multivariate logistic regression analysis showed that white blood cell count, HDL, fibrinogen, lactic acid and brain herniation were independent risk factors [odds ratio (OR) were 1.837, 198.039, 47.025, 11.559, 70.722, respectively, all P < 0.05]. In the external validation group, univariate logistic regression analysis showed that platelet count, white blood cell count, CRP, triacylglycerol, APTT, D-dimer, fibrinogen, CK, CK-MB, LDH, NIHSS score, and cerebral herniation were risk factors for PSE one year after acute stroke. Further multiple logistic regression analysis showed that APTT and cerebral herniation were independent predictors (OR were 0.587 and 116.193, respectively, both P < 0.05). The nomogram model, constructed using 10 key variables-brain herniation, periventricular stroke, carotid artery plaque, white blood cell count, triglycerides, thrombin time, D-dimer, serum sodium, lactic acid, and NIHSS score-achieved an AUC of 0.908 in the model group and 0.864 in the external validation group.</p><p><strong>Conclusions: </strong>The logistic regression-based prediction model for epilepsy one year after acute stroke, developed using machine learning algorithms, showed optimal predictive performance. The nomogram model based on the logistic regression-derived predictors showed strong discriminative power and was successfully validated externally, suggesting favorable clinical applicability and generalizability.</p>","PeriodicalId":24079,"journal":{"name":"Zhonghua wei zhong bing ji jiu yi xue","volume":"37 5","pages":"445-451"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Construction and external validation of a machine learning-based prediction model for epilepsy one year after acute stroke].\",\"authors\":\"Wenkao Zhou, Fangli Zhao, Xingqiang Qiu, Yujuan Yang, Tingting Wang, Lingyan Huang\",\"doi\":\"10.3760/cma.j.cn121430-20241225-01069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To identify the optimal machine learning algorithm for predicting post-stroke epilepsy (PSE) within one year following acute stroke, establish a nomogram model based on this algorithm, and perform external validation to achieve accurate prediction of secondary epilepsy.</p><p><strong>Methods: </strong>A total of 870 acute stroke patients admitted to the emergency department of Xiang'an Hospital of Xiamen University from June 2019 to June 2023 were enrolled for model development (model group). An external validation cohort of 435 acute stroke patients admitted to the Fifth Hospital of Xiamen during the same period was used to validate the machine learning algorithms and nomogram model. Patients were classified into control and epilepsy groups based on the development of PSE within one year. Clinical and laboratory data, including baseline characteristics, stroke location, vascular status, complications, hematologic parameters, and National Institutes of Health Stroke Scale (NIHSS) score, were collected for analysis. Nine machine learning algorithms such as logistic regression, CN2 rule induction, K-nearest neighbors, adaptive boosting, random forest, gradient boosting, support vector machine, naive Bayes, and neural network were applied to evaluate predictive performance. The area under the curve (AUC) of receiver operator characteristic curve (ROC curve) was used to identify the optimal algorithm. Logistic regression was used to screen risk factors for PSE, and the top 10 predictors were selected to construct the nomogram model. The predictive performance of the model was evaluated using the ROC curve in both the model and validation groups.</p><p><strong>Results: </strong>Among the 870 patients in the model group, 29 developed PSE within one year. Among the nine algorithms tested, logistic regression demonstrated the best performance and generalizability, with an AUC of 0.923. Univariate logistic regression identified several risk factors for PSE, including platelet count, white blood cell count, red blood cell count, glycated hemoglobin (HbA1c), C-reactive protein (CRP), triglycerides, high-density lipoprotein (HDL), aspartate aminotransferase (AST), alanine aminotransferase (ALT), activated partial thromboplastin time (APTT), thrombin time, D-dimer, fibrinogen, creatine kinase (CK), creatine kinase-MB (CK-MB), lactate dehydrogenase (LDH), serum sodium, lactic acid, anion gap, NIHSS score, brain herniation, periventricular stroke, and carotid artery plaque. Further multivariate logistic regression analysis showed that white blood cell count, HDL, fibrinogen, lactic acid and brain herniation were independent risk factors [odds ratio (OR) were 1.837, 198.039, 47.025, 11.559, 70.722, respectively, all P < 0.05]. In the external validation group, univariate logistic regression analysis showed that platelet count, white blood cell count, CRP, triacylglycerol, APTT, D-dimer, fibrinogen, CK, CK-MB, LDH, NIHSS score, and cerebral herniation were risk factors for PSE one year after acute stroke. Further multiple logistic regression analysis showed that APTT and cerebral herniation were independent predictors (OR were 0.587 and 116.193, respectively, both P < 0.05). The nomogram model, constructed using 10 key variables-brain herniation, periventricular stroke, carotid artery plaque, white blood cell count, triglycerides, thrombin time, D-dimer, serum sodium, lactic acid, and NIHSS score-achieved an AUC of 0.908 in the model group and 0.864 in the external validation group.</p><p><strong>Conclusions: </strong>The logistic regression-based prediction model for epilepsy one year after acute stroke, developed using machine learning algorithms, showed optimal predictive performance. The nomogram model based on the logistic regression-derived predictors showed strong discriminative power and was successfully validated externally, suggesting favorable clinical applicability and generalizability.</p>\",\"PeriodicalId\":24079,\"journal\":{\"name\":\"Zhonghua wei zhong bing ji jiu yi xue\",\"volume\":\"37 5\",\"pages\":\"445-451\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Zhonghua wei zhong bing ji jiu yi xue\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3760/cma.j.cn121430-20241225-01069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zhonghua wei zhong bing ji jiu yi xue","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3760/cma.j.cn121430-20241225-01069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
[Construction and external validation of a machine learning-based prediction model for epilepsy one year after acute stroke].
Objective: To identify the optimal machine learning algorithm for predicting post-stroke epilepsy (PSE) within one year following acute stroke, establish a nomogram model based on this algorithm, and perform external validation to achieve accurate prediction of secondary epilepsy.
Methods: A total of 870 acute stroke patients admitted to the emergency department of Xiang'an Hospital of Xiamen University from June 2019 to June 2023 were enrolled for model development (model group). An external validation cohort of 435 acute stroke patients admitted to the Fifth Hospital of Xiamen during the same period was used to validate the machine learning algorithms and nomogram model. Patients were classified into control and epilepsy groups based on the development of PSE within one year. Clinical and laboratory data, including baseline characteristics, stroke location, vascular status, complications, hematologic parameters, and National Institutes of Health Stroke Scale (NIHSS) score, were collected for analysis. Nine machine learning algorithms such as logistic regression, CN2 rule induction, K-nearest neighbors, adaptive boosting, random forest, gradient boosting, support vector machine, naive Bayes, and neural network were applied to evaluate predictive performance. The area under the curve (AUC) of receiver operator characteristic curve (ROC curve) was used to identify the optimal algorithm. Logistic regression was used to screen risk factors for PSE, and the top 10 predictors were selected to construct the nomogram model. The predictive performance of the model was evaluated using the ROC curve in both the model and validation groups.
Results: Among the 870 patients in the model group, 29 developed PSE within one year. Among the nine algorithms tested, logistic regression demonstrated the best performance and generalizability, with an AUC of 0.923. Univariate logistic regression identified several risk factors for PSE, including platelet count, white blood cell count, red blood cell count, glycated hemoglobin (HbA1c), C-reactive protein (CRP), triglycerides, high-density lipoprotein (HDL), aspartate aminotransferase (AST), alanine aminotransferase (ALT), activated partial thromboplastin time (APTT), thrombin time, D-dimer, fibrinogen, creatine kinase (CK), creatine kinase-MB (CK-MB), lactate dehydrogenase (LDH), serum sodium, lactic acid, anion gap, NIHSS score, brain herniation, periventricular stroke, and carotid artery plaque. Further multivariate logistic regression analysis showed that white blood cell count, HDL, fibrinogen, lactic acid and brain herniation were independent risk factors [odds ratio (OR) were 1.837, 198.039, 47.025, 11.559, 70.722, respectively, all P < 0.05]. In the external validation group, univariate logistic regression analysis showed that platelet count, white blood cell count, CRP, triacylglycerol, APTT, D-dimer, fibrinogen, CK, CK-MB, LDH, NIHSS score, and cerebral herniation were risk factors for PSE one year after acute stroke. Further multiple logistic regression analysis showed that APTT and cerebral herniation were independent predictors (OR were 0.587 and 116.193, respectively, both P < 0.05). The nomogram model, constructed using 10 key variables-brain herniation, periventricular stroke, carotid artery plaque, white blood cell count, triglycerides, thrombin time, D-dimer, serum sodium, lactic acid, and NIHSS score-achieved an AUC of 0.908 in the model group and 0.864 in the external validation group.
Conclusions: The logistic regression-based prediction model for epilepsy one year after acute stroke, developed using machine learning algorithms, showed optimal predictive performance. The nomogram model based on the logistic regression-derived predictors showed strong discriminative power and was successfully validated externally, suggesting favorable clinical applicability and generalizability.