一种基于ehr的关联研究的扩充估计程序，用于解释差异误分类

Journal of the American Medical Informatics Association : JAMIA Pub Date : 2019-10-16 DOI:10.1093/jamia/ocz180

Jiayi Tong, Jing Huang, Jessica Chubak, Xuan Wang, J. Moore, R. Hubbard, Yong Chen

{"title":"一种基于ehr的关联研究的扩充估计程序，用于解释差异误分类","authors":"Jiayi Tong, Jing Huang, Jessica Chubak, Xuan Wang, J. Moore, R. Hubbard, Yong Chen","doi":"10.1093/jamia/ocz180","DOIUrl":null,"url":null,"abstract":"OBJECTIVES\nThe ability to identify novel risk factors for health outcomes is a key strength of electronic health record (EHR)-based research. However, the validity of such studies is limited by error in EHR-derived phenotypes. The objective of this study was to develop a novel procedure for reducing bias in estimated associations between risk factors and phenotypes in EHR data.\n\n\nMATERIALS AND METHODS\nThe proposed method combines the strengths of a gold-standard phenotype obtained through manual chart review for a small validation set of patients and an automatically-derived phenotype that is available for all patients but is potentially error-prone (hereafter referred to as the algorithm-derived phenotype). An augmented estimator of associations is obtained by optimally combining these 2 phenotypes. We conducted simulation studies to evaluate the performance of the augmented estimator and conducted an analysis of risk factors for second breast cancer events using data on a cohort from Kaiser Permanente Washington.\n\n\nRESULTS\nThe proposed method was shown to reduce bias relative to an estimator using only the algorithm-derived phenotype and reduce variance compared to an estimator using only the validation data.\n\n\nDISCUSSION\nOur simulation studies and real data application demonstrate that, compared to the estimator using validation data only, the augmented estimator has lower variance (ie, higher statistical efficiency). Compared to the estimator using error-prone EHR-derived phenotypes, the augmented estimator has smaller bias.\n\n\nCONCLUSIONS\nThe proposed estimator can effectively combine an error-prone phenotype with gold-standard data from a limited chart review in order to improve analyses of risk factors using EHR data.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"An augmented estimation procedure for EHR-based association studies accounting for differential misclassification\",\"authors\":\"Jiayi Tong, Jing Huang, Jessica Chubak, Xuan Wang, J. Moore, R. Hubbard, Yong Chen\",\"doi\":\"10.1093/jamia/ocz180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"OBJECTIVES\\nThe ability to identify novel risk factors for health outcomes is a key strength of electronic health record (EHR)-based research. However, the validity of such studies is limited by error in EHR-derived phenotypes. The objective of this study was to develop a novel procedure for reducing bias in estimated associations between risk factors and phenotypes in EHR data.\\n\\n\\nMATERIALS AND METHODS\\nThe proposed method combines the strengths of a gold-standard phenotype obtained through manual chart review for a small validation set of patients and an automatically-derived phenotype that is available for all patients but is potentially error-prone (hereafter referred to as the algorithm-derived phenotype). An augmented estimator of associations is obtained by optimally combining these 2 phenotypes. We conducted simulation studies to evaluate the performance of the augmented estimator and conducted an analysis of risk factors for second breast cancer events using data on a cohort from Kaiser Permanente Washington.\\n\\n\\nRESULTS\\nThe proposed method was shown to reduce bias relative to an estimator using only the algorithm-derived phenotype and reduce variance compared to an estimator using only the validation data.\\n\\n\\nDISCUSSION\\nOur simulation studies and real data application demonstrate that, compared to the estimator using validation data only, the augmented estimator has lower variance (ie, higher statistical efficiency). Compared to the estimator using error-prone EHR-derived phenotypes, the augmented estimator has smaller bias.\\n\\n\\nCONCLUSIONS\\nThe proposed estimator can effectively combine an error-prone phenotype with gold-standard data from a limited chart review in order to improve analyses of risk factors using EHR data.\",\"PeriodicalId\":236137,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association : JAMIA\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association : JAMIA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamia/ocz180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association : JAMIA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamia/ocz180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

目的识别健康结果的新危险因素的能力是基于电子健康记录(EHR)的研究的关键优势。然而，这些研究的有效性受到ehr衍生表型误差的限制。本研究的目的是开发一种新的程序，以减少EHR数据中风险因素和表型之间估计关联的偏差。材料和方法所提出的方法结合了通过手工图表审查获得的金标准表型的优势，用于一小部分患者的验证集，以及可用于所有患者但可能容易出错的自动衍生表型(以下称为算法衍生表型)。通过优化组合这两种表型，获得了关联的增广估计。我们进行了模拟研究，以评估增强估计器的性能，并使用来自华盛顿凯撒医疗机构的队列数据对二次乳腺癌事件的危险因素进行了分析。结果表明，与仅使用算法衍生表型的估计器相比，所提出的方法减少了偏差，与仅使用验证数据的估计器相比，减少了方差。我们的仿真研究和实际数据应用表明，与仅使用验证数据的估计器相比，增广估计器具有更低的方差(即更高的统计效率)。与使用容易出错的ehr衍生表型的估计器相比，增强估计器具有较小的偏差。结论提出的估计器可以有效地将易出错的表型与来自有限图表回顾的金标准数据结合起来，从而改进使用电子病历数据对危险因素的分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An augmented estimation procedure for EHR-based association studies accounting for differential misclassification

OBJECTIVES The ability to identify novel risk factors for health outcomes is a key strength of electronic health record (EHR)-based research. However, the validity of such studies is limited by error in EHR-derived phenotypes. The objective of this study was to develop a novel procedure for reducing bias in estimated associations between risk factors and phenotypes in EHR data. MATERIALS AND METHODS The proposed method combines the strengths of a gold-standard phenotype obtained through manual chart review for a small validation set of patients and an automatically-derived phenotype that is available for all patients but is potentially error-prone (hereafter referred to as the algorithm-derived phenotype). An augmented estimator of associations is obtained by optimally combining these 2 phenotypes. We conducted simulation studies to evaluate the performance of the augmented estimator and conducted an analysis of risk factors for second breast cancer events using data on a cohort from Kaiser Permanente Washington. RESULTS The proposed method was shown to reduce bias relative to an estimator using only the algorithm-derived phenotype and reduce variance compared to an estimator using only the validation data. DISCUSSION Our simulation studies and real data application demonstrate that, compared to the estimator using validation data only, the augmented estimator has lower variance (ie, higher statistical efficiency). Compared to the estimator using error-prone EHR-derived phenotypes, the augmented estimator has smaller bias. CONCLUSIONS The proposed estimator can effectively combine an error-prone phenotype with gold-standard data from a limited chart review in order to improve analyses of risk factors using EHR data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the American Medical Informatics Association : JAMIA

自引率

0.00%

发文量