Hoang Van Dung, Vu Manh Tan, Nguyen Thi Dieu, Pham Van Linh, Nguyen Van Khai, Tran Thi Ngan, Nguyen Thi Thu Phuong
{"title":"在现实世界的医院队列中,用于预测药物诱导的免疫性血小板减少症的机器学习模型的开发和外部验证。","authors":"Hoang Van Dung, Vu Manh Tan, Nguyen Thi Dieu, Pham Van Linh, Nguyen Van Khai, Tran Thi Ngan, Nguyen Thi Thu Phuong","doi":"10.1186/s12911-025-03107-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Drug-induced immune thrombocytopenia (DITP) is a rare but potentially life-threatening adverse drug reaction, often underrecognized due to its nonspecific presentation and the lack of real-time diagnostic tools. Early identification of at-risk patients is critical to improving medication safety and preventing severe complications.</p><p><strong>Objective: </strong>To develop and externally validate a machine learning model for predicting the risk of DITP using routinely collected hospital data, and to optimize its clinical applicability through threshold adjustment.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study using electronic medical records from Hai Phong International Hospital (2018-2024) for model development and internal validation. An independent cohort from Hai Phong International Hospital - Vinh Bao (2024) served as external validation. Eligible patients received at least one drug previously implicated in DITP and had serial platelet counts. A Light Gradient Boosting Machine (LightGBM) model was trained on demographic, clinical, laboratory, and pharmacological features. Model performance was assessed using area under the ROC curve (AUC), accuracy, recall, and F1-score. Shapley Additive explanations (SHAP) were used to interpret feature contributions. Threshold tuning and decision curve analysis (DCA) supported clinical applicability.</p><p><strong>Results: </strong>Among 17,546 patients in the training cohort and 1,403 in the external cohort, DITP occurred in 432 (2.46%) and 70 (4.99%) patients, respectively. In internal validation, LightGBM achieved an AUC of 0.860, recall of 0.392, and F1-score of 0.310. External validation confirmed model robustness with an AUC of 0.813 and an F1-score of 0.341 at the optimized threshold (0.09). SHAP analysis identified AST, baseline platelet count, and renal function as key contributors. DCA and clinical impact curves demonstrated potential benefit in supporting real-time risk stratification. Clopidogrel and vancomycin were frequently associated with suspected DITP cases.</p><p><strong>Conclusion: </strong>This externally validated machine learning model enables early identification of hospitalized patients at risk of DITP using data available in routine care. Its integration into electronic medical records may support clinical decision-making, reduce diagnostic delays, and improve pharmacovigilance practices in hospital settings.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"265"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261740/pdf/","citationCount":"0","resultStr":"{\"title\":\"Development and external validation of a machine learning model for predicting drug-induced immune thrombocytopenia in a real-world hospital cohort.\",\"authors\":\"Hoang Van Dung, Vu Manh Tan, Nguyen Thi Dieu, Pham Van Linh, Nguyen Van Khai, Tran Thi Ngan, Nguyen Thi Thu Phuong\",\"doi\":\"10.1186/s12911-025-03107-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Drug-induced immune thrombocytopenia (DITP) is a rare but potentially life-threatening adverse drug reaction, often underrecognized due to its nonspecific presentation and the lack of real-time diagnostic tools. Early identification of at-risk patients is critical to improving medication safety and preventing severe complications.</p><p><strong>Objective: </strong>To develop and externally validate a machine learning model for predicting the risk of DITP using routinely collected hospital data, and to optimize its clinical applicability through threshold adjustment.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study using electronic medical records from Hai Phong International Hospital (2018-2024) for model development and internal validation. An independent cohort from Hai Phong International Hospital - Vinh Bao (2024) served as external validation. Eligible patients received at least one drug previously implicated in DITP and had serial platelet counts. A Light Gradient Boosting Machine (LightGBM) model was trained on demographic, clinical, laboratory, and pharmacological features. Model performance was assessed using area under the ROC curve (AUC), accuracy, recall, and F1-score. Shapley Additive explanations (SHAP) were used to interpret feature contributions. Threshold tuning and decision curve analysis (DCA) supported clinical applicability.</p><p><strong>Results: </strong>Among 17,546 patients in the training cohort and 1,403 in the external cohort, DITP occurred in 432 (2.46%) and 70 (4.99%) patients, respectively. In internal validation, LightGBM achieved an AUC of 0.860, recall of 0.392, and F1-score of 0.310. External validation confirmed model robustness with an AUC of 0.813 and an F1-score of 0.341 at the optimized threshold (0.09). SHAP analysis identified AST, baseline platelet count, and renal function as key contributors. DCA and clinical impact curves demonstrated potential benefit in supporting real-time risk stratification. Clopidogrel and vancomycin were frequently associated with suspected DITP cases.</p><p><strong>Conclusion: </strong>This externally validated machine learning model enables early identification of hospitalized patients at risk of DITP using data available in routine care. Its integration into electronic medical records may support clinical decision-making, reduce diagnostic delays, and improve pharmacovigilance practices in hospital settings.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"265\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261740/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-03107-3\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03107-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Development and external validation of a machine learning model for predicting drug-induced immune thrombocytopenia in a real-world hospital cohort.
Background: Drug-induced immune thrombocytopenia (DITP) is a rare but potentially life-threatening adverse drug reaction, often underrecognized due to its nonspecific presentation and the lack of real-time diagnostic tools. Early identification of at-risk patients is critical to improving medication safety and preventing severe complications.
Objective: To develop and externally validate a machine learning model for predicting the risk of DITP using routinely collected hospital data, and to optimize its clinical applicability through threshold adjustment.
Methods: We conducted a retrospective cohort study using electronic medical records from Hai Phong International Hospital (2018-2024) for model development and internal validation. An independent cohort from Hai Phong International Hospital - Vinh Bao (2024) served as external validation. Eligible patients received at least one drug previously implicated in DITP and had serial platelet counts. A Light Gradient Boosting Machine (LightGBM) model was trained on demographic, clinical, laboratory, and pharmacological features. Model performance was assessed using area under the ROC curve (AUC), accuracy, recall, and F1-score. Shapley Additive explanations (SHAP) were used to interpret feature contributions. Threshold tuning and decision curve analysis (DCA) supported clinical applicability.
Results: Among 17,546 patients in the training cohort and 1,403 in the external cohort, DITP occurred in 432 (2.46%) and 70 (4.99%) patients, respectively. In internal validation, LightGBM achieved an AUC of 0.860, recall of 0.392, and F1-score of 0.310. External validation confirmed model robustness with an AUC of 0.813 and an F1-score of 0.341 at the optimized threshold (0.09). SHAP analysis identified AST, baseline platelet count, and renal function as key contributors. DCA and clinical impact curves demonstrated potential benefit in supporting real-time risk stratification. Clopidogrel and vancomycin were frequently associated with suspected DITP cases.
Conclusion: This externally validated machine learning model enables early identification of hospitalized patients at risk of DITP using data available in routine care. Its integration into electronic medical records may support clinical decision-making, reduce diagnostic delays, and improve pharmacovigilance practices in hospital settings.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.