A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients

Journal of the American Medical Informatics Association : JAMIA Pub Date : 2019-11-13 DOI:10.1093/jamia/ocz170

Lingjiao Zhang, Xiruo Ding, Yanyuan Ma, Naveen Muthu, I. Ajmal, J. Moore, D. Herman, Jinbo Chen

{"title":"A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients","authors":"Lingjiao Zhang, Xiruo Ding, Yanyuan Ma, Naveen Muthu, I. Ajmal, J. Moore, D. Herman, Jinbo Chen","doi":"10.1093/jamia/ocz170","DOIUrl":null,"url":null,"abstract":"OBJECTIVE\nPhenotyping patients using electronic health record (EHR) data conventionally requires labeled cases and controls. Assigning labels requires manual medical chart review and therefore is labor intensive. For some phenotypes, identifying gold-standard controls is prohibitive. We developed an accurate EHR phenotyping approach that does not require labeled controls.\n\n\nMATERIALS AND METHODS\nOur framework relies on a random subset of cases, which can be specified using an anchor variable that has excellent positive predictive value and sensitivity independent of predictors. We proposed a maximum likelihood approach that efficiently leverages data from the specified cases and unlabeled patients to develop logistic regression phenotyping models, and compare model performance with existing algorithms.\n\n\nRESULTS\nOur method outperformed the existing algorithms on predictive accuracy in Monte Carlo simulation studies, application to identify hypertension patients with hypokalemia requiring oral supplementation using a simulated anchor, and application to identify primary aldosteronism patients using real-world cases and anchor variables. Our method additionally generated consistent estimates of 2 important parameters, phenotype prevalence and the proportion of true cases that are labeled.\n\n\nDISCUSSION\nUpon identification of an anchor variable that is scalable and transferable to different practices, our approach should facilitate development of scalable, transferable, and practice-specific phenotyping models.\n\n\nCONCLUSIONS\nOur proposed approach enables accurate semiautomated EHR phenotyping with minimal manual labeling and therefore should greatly facilitate EHR clinical decision support and research.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association : JAMIA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamia/ocz170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

OBJECTIVE Phenotyping patients using electronic health record (EHR) data conventionally requires labeled cases and controls. Assigning labels requires manual medical chart review and therefore is labor intensive. For some phenotypes, identifying gold-standard controls is prohibitive. We developed an accurate EHR phenotyping approach that does not require labeled controls. MATERIALS AND METHODS Our framework relies on a random subset of cases, which can be specified using an anchor variable that has excellent positive predictive value and sensitivity independent of predictors. We proposed a maximum likelihood approach that efficiently leverages data from the specified cases and unlabeled patients to develop logistic regression phenotyping models, and compare model performance with existing algorithms. RESULTS Our method outperformed the existing algorithms on predictive accuracy in Monte Carlo simulation studies, application to identify hypertension patients with hypokalemia requiring oral supplementation using a simulated anchor, and application to identify primary aldosteronism patients using real-world cases and anchor variables. Our method additionally generated consistent estimates of 2 important parameters, phenotype prevalence and the proportion of true cases that are labeled. DISCUSSION Upon identification of an anchor variable that is scalable and transferable to different practices, our approach should facilitate development of scalable, transferable, and practice-specific phenotyping models. CONCLUSIONS Our proposed approach enables accurate semiautomated EHR phenotyping with minimal manual labeling and therefore should greatly facilitate EHR clinical decision support and research.

查看原文本刊更多论文

使用阳性和未标记患者的电子健康记录表型的最大可能性方法

目的使用电子健康记录(EHR)数据对患者进行表型分析通常需要标记病例和对照。分配标签需要手动检查医疗图表，因此是劳动密集型的。对于某些表型，确定金标准对照是令人望而却步的。我们开发了一种准确的EHR表型方法，不需要标记对照。材料和方法我们的框架依赖于随机的案例子集，这些案例可以使用锚定变量来指定，锚定变量具有出色的正预测值和独立于预测因子的敏感性。我们提出了一种最大似然方法，有效地利用来自特定病例和未标记患者的数据来开发逻辑回归表型模型，并将模型性能与现有算法进行比较。结果我们的方法在蒙特卡罗模拟研究、使用模拟锚点识别需要口服补品的低钾血症高血压患者以及使用真实病例和锚点变量识别原发性醛固酮增多症患者方面的预测准确性优于现有算法。我们的方法还产生了2个重要参数的一致估计，即表型患病率和标记的真实病例比例。在确定一个可扩展和可转移到不同实践的锚变量后，我们的方法应该促进可扩展、可转移和实践特异性表型模型的发展。结论我们提出的方法能够以最少的人工标记实现准确的半自动EHR表型，因此可以极大地促进EHR临床决策支持和研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association : JAMIA

自引率

0.00%

发文量