Community-acquired pneumonia identification from electronic health records in the absence of a gold standard: A Bayesian latent class analysis.

IF 7.7
PLOS digital health Pub Date : 2025-07-21 eCollection Date: 2025-07-01 DOI:10.1371/journal.pdig.0000936
Jia Wei, Kevin Yuan, Augustine Luk, A Sarah Walker, David W Eyre
{"title":"Community-acquired pneumonia identification from electronic health records in the absence of a gold standard: A Bayesian latent class analysis.","authors":"Jia Wei, Kevin Yuan, Augustine Luk, A Sarah Walker, David W Eyre","doi":"10.1371/journal.pdig.0000936","DOIUrl":null,"url":null,"abstract":"<p><p>Community-acquired pneumonia (CAP) is common and a significant cause of mortality. However, CAP surveillance commonly relies on diagnostic codes from electronic health records (EHRs), with imperfect accuracy. We used Bayesian latent class models with multiple imputation to assess the accuracy of CAP diagnostic codes in the absence of a gold standard and to explore the contribution of various EHR data sources in improving CAP identification. Using 491,681 hospital admissions in Oxfordshire, UK, from 2016 to 2023, we investigated four EHR-based algorithms for CAP detection based on 1) primary diagnostic codes, 2) clinician-documented indications for antibiotic prescriptions, 3) radiology free-text reports, and 4) vital signs and blood tests. The estimated prevalence of CAP as the reason for emergency hospital admission was 13.6% (95% credible interval 13.3-14.0%). Primary diagnostic codes had low sensitivity but a high specificity (best fitting model, 0.275 and 0.997 respectively), as did vital signs with blood tests (0.348 and 0.963). Antibiotic indication text had a higher sensitivity (0.590) but a lower specificity (0.982), with radiology reports intermediate (0.485 and 0.960). Defining CAP as present when detected by any algorithm produced sensitivity and specificity of 0.873 and 0.905 respectively. Results remained consistent using alternative priors and in sensitivity analyses. Relying solely on diagnostic codes for CAP surveillance leads to substantial under-detection; combining EHR data across multiple algorithms enhances identification accuracy. Bayesian latent class analysis-based approaches could improve CAP surveillance and epidemiological estimates by integrating multiple EHR sources, even without a gold standard for CAP diagnosis.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 7","pages":"e0000936"},"PeriodicalIF":7.7000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12279105/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Community-acquired pneumonia (CAP) is common and a significant cause of mortality. However, CAP surveillance commonly relies on diagnostic codes from electronic health records (EHRs), with imperfect accuracy. We used Bayesian latent class models with multiple imputation to assess the accuracy of CAP diagnostic codes in the absence of a gold standard and to explore the contribution of various EHR data sources in improving CAP identification. Using 491,681 hospital admissions in Oxfordshire, UK, from 2016 to 2023, we investigated four EHR-based algorithms for CAP detection based on 1) primary diagnostic codes, 2) clinician-documented indications for antibiotic prescriptions, 3) radiology free-text reports, and 4) vital signs and blood tests. The estimated prevalence of CAP as the reason for emergency hospital admission was 13.6% (95% credible interval 13.3-14.0%). Primary diagnostic codes had low sensitivity but a high specificity (best fitting model, 0.275 and 0.997 respectively), as did vital signs with blood tests (0.348 and 0.963). Antibiotic indication text had a higher sensitivity (0.590) but a lower specificity (0.982), with radiology reports intermediate (0.485 and 0.960). Defining CAP as present when detected by any algorithm produced sensitivity and specificity of 0.873 and 0.905 respectively. Results remained consistent using alternative priors and in sensitivity analyses. Relying solely on diagnostic codes for CAP surveillance leads to substantial under-detection; combining EHR data across multiple algorithms enhances identification accuracy. Bayesian latent class analysis-based approaches could improve CAP surveillance and epidemiological estimates by integrating multiple EHR sources, even without a gold standard for CAP diagnosis.

在没有金标准的情况下,从电子健康记录中识别社区获得性肺炎:贝叶斯潜在类分析
社区获得性肺炎(CAP)很常见,也是导致死亡的重要原因。然而,CAP监测通常依赖于电子健康记录(EHRs)中的诊断代码,其准确性并不完美。在缺乏金标准的情况下,我们使用贝叶斯潜类模型评估CAP诊断代码的准确性,并探讨各种EHR数据源在改进CAP识别方面的贡献。2016年至2023年,我们对英国牛津郡491681例入院患者进行了研究,研究了四种基于ehr的CAP检测算法,这些算法基于1)主要诊断代码,2)临床医生记录的抗生素处方指征,3)放射学自由文本报告,以及4)生命体征和血液检查。CAP作为急诊住院原因的估计患病率为13.6%(95%可信区间13.3-14.0%)。初级诊断代码的敏感性较低,但特异性较高(最佳拟合模型分别为0.275和0.997),血液检测生命体征的特异性为0.348和0.963。抗生素指征文本敏感性较高(0.590),特异性较低(0.982),放射学报告中等(0.485和0.960)。当任何算法检测到CAP时,将其定义为存在,其灵敏度和特异性分别为0.873和0.905。使用替代先验和敏感性分析,结果保持一致。仅依靠诊断代码进行CAP监测会导致严重的检测不足;跨多种算法组合EHR数据可提高识别准确性。基于贝叶斯潜类分析的方法可以通过整合多个电子病历来源来改善CAP监测和流行病学估计,即使没有CAP诊断的金标准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信