J T Tan, R Zhang, K H Chan, J Qin, I F N Hung, K S Cheung
{"title":"Machine learning model for prediction of coronavirus disease 2019 within 6 months after three doses of BNT162b2 in Hong Kong.","authors":"J T Tan, R Zhang, K H Chan, J Qin, I F N Hung, K S Cheung","doi":"10.12809/hkmj2411879","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>We aimed to develop a machine learning (ML) model to predict the risk of coronavirus disease 2019 (COVID-19) among three-dose BNT162b2 vaccine recipients in Hong Kong.</p><p><strong>Methods: </strong>A total of 304 individuals who had received three doses of BNT162b2 were recruited from three vaccination centres in Hong Kong between May and August 2021. The dataset was randomly divided into training (n=184) and testing (n=120) sets in a 6:4 ratio. Demographics, co-morbidities and medications, blood tests (complete blood count, liver and renal function tests, glycated haemoglobin level, lipid profile, and presence of hepatitis B surface antigen), and controlled attenuation parameter (CAP) were used to develop six ML models (logistic regression, linear discriminant analysis, random forest, naïve Bayes, neural network [NN], and extreme gradient boosting models) to predict COVID-19 risk. Model performance was assessed using area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV) and negative predictive value (NPV).</p><p><strong>Results: </strong>Among the study population (median age: 50.9 years [interquartile range=43.6-57.8]; men: 30.9% [n=94]), 27 participants (8.9%) developed COVID-19 within 6 months. Fifteen clinical variables were used to train the models. The NN model achieved the best performance, with an AUC of 0.74 (95% confidence interval [95% CI]=0.60-0.88). Using the optimal cut-off value based on the maximised Youden index, sensitivity, specificity, PPV, and NPV were 90% (95% CI=55%-100%), 58% (95% CI=48%-68%), 16% (95% CI=8%-29%), and 98% (95% CI=92%-100%), respectively. The top predictors in the NN model include age, prediabetes/diabetes, CAP, alanine aminotransferase level, and aspartate aminotransferase level.</p><p><strong>Conclusion: </strong>An NN model integrating 15 clinical variables effectively identified individuals at low risk of COVID-19 following three doses of BNT162b2.</p>","PeriodicalId":48828,"journal":{"name":"Hong Kong Medical Journal","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hong Kong Medical Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.12809/hkmj2411879","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: We aimed to develop a machine learning (ML) model to predict the risk of coronavirus disease 2019 (COVID-19) among three-dose BNT162b2 vaccine recipients in Hong Kong.
Methods: A total of 304 individuals who had received three doses of BNT162b2 were recruited from three vaccination centres in Hong Kong between May and August 2021. The dataset was randomly divided into training (n=184) and testing (n=120) sets in a 6:4 ratio. Demographics, co-morbidities and medications, blood tests (complete blood count, liver and renal function tests, glycated haemoglobin level, lipid profile, and presence of hepatitis B surface antigen), and controlled attenuation parameter (CAP) were used to develop six ML models (logistic regression, linear discriminant analysis, random forest, naïve Bayes, neural network [NN], and extreme gradient boosting models) to predict COVID-19 risk. Model performance was assessed using area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV) and negative predictive value (NPV).
Results: Among the study population (median age: 50.9 years [interquartile range=43.6-57.8]; men: 30.9% [n=94]), 27 participants (8.9%) developed COVID-19 within 6 months. Fifteen clinical variables were used to train the models. The NN model achieved the best performance, with an AUC of 0.74 (95% confidence interval [95% CI]=0.60-0.88). Using the optimal cut-off value based on the maximised Youden index, sensitivity, specificity, PPV, and NPV were 90% (95% CI=55%-100%), 58% (95% CI=48%-68%), 16% (95% CI=8%-29%), and 98% (95% CI=92%-100%), respectively. The top predictors in the NN model include age, prediabetes/diabetes, CAP, alanine aminotransferase level, and aspartate aminotransferase level.
Conclusion: An NN model integrating 15 clinical variables effectively identified individuals at low risk of COVID-19 following three doses of BNT162b2.
期刊介绍:
The HKMJ is a Hong Kong-based, peer-reviewed, general medical journal which is circulated to 6000 readers, including all members of the HKMA and Fellows of the HKAM. The HKMJ publishes original research papers, review articles, medical practice papers, case reports, editorials, commentaries, book reviews, and letters to the Editor. Topics of interest include all subjects that relate to clinical practice and research in all branches of medicine. The HKMJ welcomes manuscripts from authors, but usually solicits reviews. Proposals for review papers can be sent to the Managing Editor directly. Please refer to the contact information of the Editorial Office.