Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes.

IF 2.5
Harvard data science review Pub Date : 2022-01-01 Epub Date: 2022-04-28 DOI:10.1162/99608f92.cbe67e91
Neal D Goldstein, Deborah Kahal, Karla Testa, Ed J Gracely, Igor Burstyn
{"title":"Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes.","authors":"Neal D Goldstein,&nbsp;Deborah Kahal,&nbsp;Karla Testa,&nbsp;Ed J Gracely,&nbsp;Igor Burstyn","doi":"10.1162/99608f92.cbe67e91","DOIUrl":null,"url":null,"abstract":"<p><p>It is incumbent upon all researchers who use the electronic health record (EHR), including data scientists, to understand the quality of such data. EHR data may be subject to measurement error or misclassification that have the potential to bias results, unless one applies the available computational techniques specifically created for this problem. In this article, we begin with a discussion of data-quality issues in the EHR focusing on health outcomes. We review the concepts of sensitivity, specificity, positive and negative predictive values, and demonstrate how the imperfect classification of a dichotomous outcome variable can bias an analysis, both in terms of prevalence of the outcome, and relative risk of the outcome under one treatment regime (aka exposure) compared to another. This is then followed by a description of a generalizable approach to probabilistic (quantitative) bias analysis using a combination of regression estimation of the parameters that relate the true and observed data and application of these estimates to adjust the prevalence and relative risk that may have existed if there was no misclassification. We describe bias analysis that accounts for both random and systematic errors and highlight its limitations. We then motivate a case study with the goal of validating the accuracy of a health outcome, chronic infection with hepatitis C virus, derived from a diagnostic code in the EHR. Finally, we demonstrate our approaches on the case study and conclude by summarizing the literature on outcome misclassification and quantitative bias analysis.</p>","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9624477/pdf/","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Harvard data science review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/99608f92.cbe67e91","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/4/28 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

It is incumbent upon all researchers who use the electronic health record (EHR), including data scientists, to understand the quality of such data. EHR data may be subject to measurement error or misclassification that have the potential to bias results, unless one applies the available computational techniques specifically created for this problem. In this article, we begin with a discussion of data-quality issues in the EHR focusing on health outcomes. We review the concepts of sensitivity, specificity, positive and negative predictive values, and demonstrate how the imperfect classification of a dichotomous outcome variable can bias an analysis, both in terms of prevalence of the outcome, and relative risk of the outcome under one treatment regime (aka exposure) compared to another. This is then followed by a description of a generalizable approach to probabilistic (quantitative) bias analysis using a combination of regression estimation of the parameters that relate the true and observed data and application of these estimates to adjust the prevalence and relative risk that may have existed if there was no misclassification. We describe bias analysis that accounts for both random and systematic errors and highlight its limitations. We then motivate a case study with the goal of validating the accuracy of a health outcome, chronic infection with hepatitis C virus, derived from a diagnostic code in the EHR. Finally, we demonstrate our approaches on the case study and conclude by summarizing the literature on outcome misclassification and quantitative bias analysis.

Abstract Image

Abstract Image

Abstract Image

电子健康记录研究中的数据质量:通过诊断代码对不完全确定的健康结果进行验证和定量偏倚分析的方法。
所有使用电子健康记录(EHR)的研究人员,包括数据科学家,都有责任了解这些数据的质量。电子病历数据可能存在测量误差或分类错误,从而可能导致结果偏倚,除非应用专门为此问题创建的可用计算技术。在本文中,我们首先讨论电子病历中关注健康结果的数据质量问题。我们回顾了敏感性、特异性、阳性和阴性预测值的概念,并展示了二分结果变量的不完善分类如何在结果的患病率和一种治疗方案(即暴露)下与另一种治疗方案相比的结果的相对风险方面使分析产生偏差。随后描述了一种概率(定量)偏差分析的可推广方法,该方法结合了与真实数据和观察数据相关的参数的回归估计,以及这些估计的应用,以调整如果没有错误分类,可能存在的患病率和相对风险。我们描述了解释随机和系统误差的偏差分析,并强调了其局限性。然后,我们发起了一个案例研究,目的是验证从电子病历中的诊断代码得出的健康结果——慢性丙型肝炎病毒感染——的准确性。最后,我们在案例研究中展示了我们的方法,并总结了结果错误分类和定量偏倚分析的文献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信