Identifying People Living With or Those at Risk for HIV in a Nationally Sampled Electronic Health Record Repository Called the National Clinical Cohort Collaborative: Computational Phenotyping Study.
Eric Hurwitz, Cara D Varley, A Jerrod Anzolone, Vithal Madhira, Amy L Olex, Jing Sun, Dimple Vaidya, Nada Fadul, Jessica Y Islam, Lesley E Jackson, Kenneth J Wilkins, Zachary Butzin-Dozier, Dongmei Li, Sandra E Safo, Julie A McMurry, Pooja Maheria, Tommy Williams, Shukri A Hassan, Melissa A Haendel, Rena C Patel
{"title":"Identifying People Living With or Those at Risk for HIV in a Nationally Sampled Electronic Health Record Repository Called the National Clinical Cohort Collaborative: Computational Phenotyping Study.","authors":"Eric Hurwitz, Cara D Varley, A Jerrod Anzolone, Vithal Madhira, Amy L Olex, Jing Sun, Dimple Vaidya, Nada Fadul, Jessica Y Islam, Lesley E Jackson, Kenneth J Wilkins, Zachary Butzin-Dozier, Dongmei Li, Sandra E Safo, Julie A McMurry, Pooja Maheria, Tommy Williams, Shukri A Hassan, Melissa A Haendel, Rena C Patel","doi":"10.2196/68143","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Electronic health records (EHRs) provide valuable insights to address clinical and epidemiological research concerning HIV, including the disproportionate impact of the COVID-19 pandemic on people living with HIV. To identify this population, most studies using EHR or claims databases start with diagnostic codes, which can result in misclassification without further refinement using drug or laboratory data. Furthermore, given that antiretrovirals now have indications for both HIV and COVID-19 (ie, ritonavir in nirmatrelvir/ritonavir), new phenotyping methods are needed to better capture people living with HIV. Therefore, we created a generalizable and innovative method to robustly identify people living with HIV, preexposure prophylaxis (PrEP) users, postexposure prophylaxis (PEP) users, and people not living with HIV using granular clinical data after the emergence of COVID-19.</p><p><strong>Objective: </strong>The primary aim of this study was to use computational phenotyping in EHR data to identify people living with HIV (cohort 1), PrEP users (cohort 2), PEP users (cohort 3), or \"none of the above\" (people not living with HIV; cohort 4) and describe COVID-19-related characteristics among these cohorts.</p><p><strong>Methods: </strong>We used diagnostic and laboratory measurements and drug concepts in the National Clinical Cohort Collaborative to create a computational phenotype for the 4 cohorts with confidence levels. For robustness, we conducted a randomly sampled, blinded clinician annotation to assess precision. We calculated the distribution of demographics, comorbidities, and COVID-19 variables among the 4 cohorts.</p><p><strong>Results: </strong>We identified 132,664 people living with HIV with a high level of confidence, 36,088 PrEP users, 4120 PEP users, and 20,639,675 people not living with HIV. Most people living with HIV were identified by a combination of medical conditions, laboratory measurements, and drug exposures (74,809/132,664, 56.4%), followed by laboratory measurements and drug exposures (15,241/132,664, 11.5%) and then by medical conditions and drug exposures (14,595/132,664, 11%). A higher proportion of people living with HIV experienced COVID-19-related hospitalization (4650,132,664, 3.5%) or mortality (828/132,664, 0.6%) and all-cause mortality (2083/132,664, 1.6%) compared to other cohorts.</p><p><strong>Conclusions: </strong>Using an extensive phenotyping algorithm leveraging granular data in an EHR repository, we have identified people living with HIV, people not living with HIV, PrEP users, and PEP users. Our findings offer transferable lessons to optimize future EHR phenotyping for these cohorts.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e68143"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/68143","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Electronic health records (EHRs) provide valuable insights to address clinical and epidemiological research concerning HIV, including the disproportionate impact of the COVID-19 pandemic on people living with HIV. To identify this population, most studies using EHR or claims databases start with diagnostic codes, which can result in misclassification without further refinement using drug or laboratory data. Furthermore, given that antiretrovirals now have indications for both HIV and COVID-19 (ie, ritonavir in nirmatrelvir/ritonavir), new phenotyping methods are needed to better capture people living with HIV. Therefore, we created a generalizable and innovative method to robustly identify people living with HIV, preexposure prophylaxis (PrEP) users, postexposure prophylaxis (PEP) users, and people not living with HIV using granular clinical data after the emergence of COVID-19.
Objective: The primary aim of this study was to use computational phenotyping in EHR data to identify people living with HIV (cohort 1), PrEP users (cohort 2), PEP users (cohort 3), or "none of the above" (people not living with HIV; cohort 4) and describe COVID-19-related characteristics among these cohorts.
Methods: We used diagnostic and laboratory measurements and drug concepts in the National Clinical Cohort Collaborative to create a computational phenotype for the 4 cohorts with confidence levels. For robustness, we conducted a randomly sampled, blinded clinician annotation to assess precision. We calculated the distribution of demographics, comorbidities, and COVID-19 variables among the 4 cohorts.
Results: We identified 132,664 people living with HIV with a high level of confidence, 36,088 PrEP users, 4120 PEP users, and 20,639,675 people not living with HIV. Most people living with HIV were identified by a combination of medical conditions, laboratory measurements, and drug exposures (74,809/132,664, 56.4%), followed by laboratory measurements and drug exposures (15,241/132,664, 11.5%) and then by medical conditions and drug exposures (14,595/132,664, 11%). A higher proportion of people living with HIV experienced COVID-19-related hospitalization (4650,132,664, 3.5%) or mortality (828/132,664, 0.6%) and all-cause mortality (2083/132,664, 1.6%) compared to other cohorts.
Conclusions: Using an extensive phenotyping algorithm leveraging granular data in an EHR repository, we have identified people living with HIV, people not living with HIV, PrEP users, and PEP users. Our findings offer transferable lessons to optimize future EHR phenotyping for these cohorts.
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.