Comparing conventional and Bayesian workflows for clinical outcome prediction modelling with an exemplar cohort study of severe COVID-19 infection incorporating clinical biomarker test results.

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Brian Sullivan, Edward Barker, Louis MacGregor, Leo Gorman, Philip Williams, Ranjeet Bhamber, Matt Thomas, Stefan Gurney, Catherine Hyams, Alastair Whiteway, Jennifer A Cooper, Chris McWilliams, Katy Turner, Andrew W Dowsey, Mahableshwar Albur
{"title":"Comparing conventional and Bayesian workflows for clinical outcome prediction modelling with an exemplar cohort study of severe COVID-19 infection incorporating clinical biomarker test results.","authors":"Brian Sullivan, Edward Barker, Louis MacGregor, Leo Gorman, Philip Williams, Ranjeet Bhamber, Matt Thomas, Stefan Gurney, Catherine Hyams, Alastair Whiteway, Jennifer A Cooper, Chris McWilliams, Katy Turner, Andrew W Dowsey, Mahableshwar Albur","doi":"10.1186/s12911-025-02955-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Assessing risk factors and creating prediction models from real-world medical data is challenging, requiring numerous modelling decisions with clinical guidance. Logistic regression is a common model for such studies, for which we advocate the use of Bayesian methods that can jointly deliver probabilistic risk factor inference and prediction. As an exemplar, we compare Bayesian logistic regression with horseshoe priors and Projective Prediction variable selection with the established frequentist LASSO approach, to predict severe COVID-19 outcomes (death or ICU admittance) from demographic and laboratory biomarker data. Our study serves as guidance on data curation, variable selection, and performance assessment with cross-validation.</p><p><strong>Methods: </strong>Our source data is based on a retrospective observational cohort design with records from three National Health Service (NHS) Trusts in southwest England, UK. Models were fit to predict severe outcomes within 28 days after admission to hospital (or a positive PCR result if already admitted) using demographic data and the first result from 30 biomarker tests collected within 3 days after admission (or testing positive if already admitted).</p><p><strong>Results: </strong>Patients included hospitalized adults positive for COVID-19 from March to October 2020, 756 total patients: Mean age 71, 45% female, 31% (n=234) had a severe outcome, of whom 88% (n=206) died. Patients were split into training (n=534) and external validation groups (n=222). Using our Bayesian pipeline, we show a reduced variable model using Age, Urea, Prothrombin time (PT) C-reactive protein (CRP), and Neutrophil-Lymphocyte ratio (NLR) has better predictive performance (median external AUC: 0.71, 95% Quantile [0.7, 0.72]) relative to a GLM using all variables (external AUC: 0.67 [0.63, 0.71]).</p><p><strong>Conclusion: </strong>Urea, PT, CRP, and NLR have been highlighted by other studies, and respectively suggest that hypovolemia, derangement of circulation via clotting, and inflammation are strong predictive risk factors of severity. This study provides guidance on conventional and Bayesian regression and prediction modelling with complex clinical data.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"123"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11892292/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02955-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Assessing risk factors and creating prediction models from real-world medical data is challenging, requiring numerous modelling decisions with clinical guidance. Logistic regression is a common model for such studies, for which we advocate the use of Bayesian methods that can jointly deliver probabilistic risk factor inference and prediction. As an exemplar, we compare Bayesian logistic regression with horseshoe priors and Projective Prediction variable selection with the established frequentist LASSO approach, to predict severe COVID-19 outcomes (death or ICU admittance) from demographic and laboratory biomarker data. Our study serves as guidance on data curation, variable selection, and performance assessment with cross-validation.

Methods: Our source data is based on a retrospective observational cohort design with records from three National Health Service (NHS) Trusts in southwest England, UK. Models were fit to predict severe outcomes within 28 days after admission to hospital (or a positive PCR result if already admitted) using demographic data and the first result from 30 biomarker tests collected within 3 days after admission (or testing positive if already admitted).

Results: Patients included hospitalized adults positive for COVID-19 from March to October 2020, 756 total patients: Mean age 71, 45% female, 31% (n=234) had a severe outcome, of whom 88% (n=206) died. Patients were split into training (n=534) and external validation groups (n=222). Using our Bayesian pipeline, we show a reduced variable model using Age, Urea, Prothrombin time (PT) C-reactive protein (CRP), and Neutrophil-Lymphocyte ratio (NLR) has better predictive performance (median external AUC: 0.71, 95% Quantile [0.7, 0.72]) relative to a GLM using all variables (external AUC: 0.67 [0.63, 0.71]).

Conclusion: Urea, PT, CRP, and NLR have been highlighted by other studies, and respectively suggest that hypovolemia, derangement of circulation via clotting, and inflammation are strong predictive risk factors of severity. This study provides guidance on conventional and Bayesian regression and prediction modelling with complex clinical data.

结合临床生物标志物检测结果的重症COVID-19感染范例队列研究,比较常规和贝叶斯临床结果预测建模工作流程
目的:从现实世界的医疗数据中评估风险因素和创建预测模型是具有挑战性的,需要在临床指导下进行大量建模决策。逻辑回归是此类研究的常用模型,我们提倡使用贝叶斯方法,可以联合进行概率风险因素推理和预测。作为一个例子,我们比较了马蹄形先验的贝叶斯逻辑回归和已建立的频率LASSO方法的预测预测变量选择,以从人口统计学和实验室生物标志物数据预测COVID-19的严重结局(死亡或ICU入院)。我们的研究为数据管理、变量选择和交叉验证的绩效评估提供指导。方法:我们的源数据基于回顾性观察队列设计,包括英国英格兰西南部三个国家卫生服务(NHS)信托基金的记录。使用人口统计数据和入院后3天内收集的30项生物标志物试验的第一次结果(或已入院的生物标志物试验阳性),拟合模型预测入院后28天内的严重结局(或已入院的生物标志物试验阳性)。结果:2020年3月至10月住院的成人COVID-19阳性患者共756例,平均年龄71岁,女性45%,重症结局31% (n=234),其中88% (n=206)死亡。患者分为训练组(n=534)和外部验证组(n=222)。使用我们的贝叶斯管道,我们展示了使用年龄、尿素、凝血酶原时间(PT) c反应蛋白(CRP)和中性粒细胞淋巴细胞比率(NLR)的简化变量模型相对于使用所有变量的GLM(外部AUC: 0.67[0.63, 0.71])具有更好的预测性能(外部AUC中位数:0.71,95%分位数[0.7,0.72])。结论:尿素、PT、CRP和NLR已被其他研究强调,分别提示低血容量、凝血循环紊乱和炎症是严重程度的强预测危险因素。本研究对复杂临床数据的常规贝叶斯回归和预测建模具有指导意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信