Brian Sullivan, Edward Barker, Louis MacGregor, Leo Gorman, Philip Williams, Ranjeet Bhamber, Matt Thomas, Stefan Gurney, Catherine Hyams, Alastair Whiteway, Jennifer A Cooper, Chris McWilliams, Katy Turner, Andrew W Dowsey, Mahableshwar Albur
{"title":"结合临床生物标志物检测结果的重症COVID-19感染范例队列研究,比较常规和贝叶斯临床结果预测建模工作流程","authors":"Brian Sullivan, Edward Barker, Louis MacGregor, Leo Gorman, Philip Williams, Ranjeet Bhamber, Matt Thomas, Stefan Gurney, Catherine Hyams, Alastair Whiteway, Jennifer A Cooper, Chris McWilliams, Katy Turner, Andrew W Dowsey, Mahableshwar Albur","doi":"10.1186/s12911-025-02955-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Assessing risk factors and creating prediction models from real-world medical data is challenging, requiring numerous modelling decisions with clinical guidance. Logistic regression is a common model for such studies, for which we advocate the use of Bayesian methods that can jointly deliver probabilistic risk factor inference and prediction. As an exemplar, we compare Bayesian logistic regression with horseshoe priors and Projective Prediction variable selection with the established frequentist LASSO approach, to predict severe COVID-19 outcomes (death or ICU admittance) from demographic and laboratory biomarker data. Our study serves as guidance on data curation, variable selection, and performance assessment with cross-validation.</p><p><strong>Methods: </strong>Our source data is based on a retrospective observational cohort design with records from three National Health Service (NHS) Trusts in southwest England, UK. Models were fit to predict severe outcomes within 28 days after admission to hospital (or a positive PCR result if already admitted) using demographic data and the first result from 30 biomarker tests collected within 3 days after admission (or testing positive if already admitted).</p><p><strong>Results: </strong>Patients included hospitalized adults positive for COVID-19 from March to October 2020, 756 total patients: Mean age 71, 45% female, 31% (n=234) had a severe outcome, of whom 88% (n=206) died. Patients were split into training (n=534) and external validation groups (n=222). Using our Bayesian pipeline, we show a reduced variable model using Age, Urea, Prothrombin time (PT) C-reactive protein (CRP), and Neutrophil-Lymphocyte ratio (NLR) has better predictive performance (median external AUC: 0.71, 95% Quantile [0.7, 0.72]) relative to a GLM using all variables (external AUC: 0.67 [0.63, 0.71]).</p><p><strong>Conclusion: </strong>Urea, PT, CRP, and NLR have been highlighted by other studies, and respectively suggest that hypovolemia, derangement of circulation via clotting, and inflammation are strong predictive risk factors of severity. This study provides guidance on conventional and Bayesian regression and prediction modelling with complex clinical data.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"123"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11892292/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparing conventional and Bayesian workflows for clinical outcome prediction modelling with an exemplar cohort study of severe COVID-19 infection incorporating clinical biomarker test results.\",\"authors\":\"Brian Sullivan, Edward Barker, Louis MacGregor, Leo Gorman, Philip Williams, Ranjeet Bhamber, Matt Thomas, Stefan Gurney, Catherine Hyams, Alastair Whiteway, Jennifer A Cooper, Chris McWilliams, Katy Turner, Andrew W Dowsey, Mahableshwar Albur\",\"doi\":\"10.1186/s12911-025-02955-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Assessing risk factors and creating prediction models from real-world medical data is challenging, requiring numerous modelling decisions with clinical guidance. Logistic regression is a common model for such studies, for which we advocate the use of Bayesian methods that can jointly deliver probabilistic risk factor inference and prediction. As an exemplar, we compare Bayesian logistic regression with horseshoe priors and Projective Prediction variable selection with the established frequentist LASSO approach, to predict severe COVID-19 outcomes (death or ICU admittance) from demographic and laboratory biomarker data. Our study serves as guidance on data curation, variable selection, and performance assessment with cross-validation.</p><p><strong>Methods: </strong>Our source data is based on a retrospective observational cohort design with records from three National Health Service (NHS) Trusts in southwest England, UK. Models were fit to predict severe outcomes within 28 days after admission to hospital (or a positive PCR result if already admitted) using demographic data and the first result from 30 biomarker tests collected within 3 days after admission (or testing positive if already admitted).</p><p><strong>Results: </strong>Patients included hospitalized adults positive for COVID-19 from March to October 2020, 756 total patients: Mean age 71, 45% female, 31% (n=234) had a severe outcome, of whom 88% (n=206) died. Patients were split into training (n=534) and external validation groups (n=222). Using our Bayesian pipeline, we show a reduced variable model using Age, Urea, Prothrombin time (PT) C-reactive protein (CRP), and Neutrophil-Lymphocyte ratio (NLR) has better predictive performance (median external AUC: 0.71, 95% Quantile [0.7, 0.72]) relative to a GLM using all variables (external AUC: 0.67 [0.63, 0.71]).</p><p><strong>Conclusion: </strong>Urea, PT, CRP, and NLR have been highlighted by other studies, and respectively suggest that hypovolemia, derangement of circulation via clotting, and inflammation are strong predictive risk factors of severity. This study provides guidance on conventional and Bayesian regression and prediction modelling with complex clinical data.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"123\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11892292/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-02955-3\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02955-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Comparing conventional and Bayesian workflows for clinical outcome prediction modelling with an exemplar cohort study of severe COVID-19 infection incorporating clinical biomarker test results.
Purpose: Assessing risk factors and creating prediction models from real-world medical data is challenging, requiring numerous modelling decisions with clinical guidance. Logistic regression is a common model for such studies, for which we advocate the use of Bayesian methods that can jointly deliver probabilistic risk factor inference and prediction. As an exemplar, we compare Bayesian logistic regression with horseshoe priors and Projective Prediction variable selection with the established frequentist LASSO approach, to predict severe COVID-19 outcomes (death or ICU admittance) from demographic and laboratory biomarker data. Our study serves as guidance on data curation, variable selection, and performance assessment with cross-validation.
Methods: Our source data is based on a retrospective observational cohort design with records from three National Health Service (NHS) Trusts in southwest England, UK. Models were fit to predict severe outcomes within 28 days after admission to hospital (or a positive PCR result if already admitted) using demographic data and the first result from 30 biomarker tests collected within 3 days after admission (or testing positive if already admitted).
Results: Patients included hospitalized adults positive for COVID-19 from March to October 2020, 756 total patients: Mean age 71, 45% female, 31% (n=234) had a severe outcome, of whom 88% (n=206) died. Patients were split into training (n=534) and external validation groups (n=222). Using our Bayesian pipeline, we show a reduced variable model using Age, Urea, Prothrombin time (PT) C-reactive protein (CRP), and Neutrophil-Lymphocyte ratio (NLR) has better predictive performance (median external AUC: 0.71, 95% Quantile [0.7, 0.72]) relative to a GLM using all variables (external AUC: 0.67 [0.63, 0.71]).
Conclusion: Urea, PT, CRP, and NLR have been highlighted by other studies, and respectively suggest that hypovolemia, derangement of circulation via clotting, and inflammation are strong predictive risk factors of severity. This study provides guidance on conventional and Bayesian regression and prediction modelling with complex clinical data.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.