A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank.

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES
Do Hyun Kim, Aubrey Jensen, Kelly Jones, Sridharan Raghavan, Lawrence S Phillips, Adriana Hung, Yan V Sun, Gang Li, Peter Reaven, Hua Zhou, Jin J Zhou
{"title":"A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank.","authors":"Do Hyun Kim,&nbsp;Aubrey Jensen,&nbsp;Kelly Jones,&nbsp;Sridharan Raghavan,&nbsp;Lawrence S Phillips,&nbsp;Adriana Hung,&nbsp;Yan V Sun,&nbsp;Gang Li,&nbsp;Peter Reaven,&nbsp;Hua Zhou,&nbsp;Jin J Zhou","doi":"10.1093/jamiaopen/ooad006","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case-control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-varying risk factors that drive them.</p><p><strong>Materials and methods: </strong>We developed an algorithmic approach to phenotyping the incidence of diseases by consolidating data sources from the UK Biobank (UKB), including primary care electronic health records (EHRs). We focused on defining events, event dates, and their censoring time, including relevant terms and existing phenotypes, excluding generic, rare, or semantically distant terms, forward-mapping terminology terms, and expert review. We applied our approach to phenotyping diabetes complications, including a composite cardiovascular disease (CVD) outcome, diabetic kidney disease (DKD), and diabetic retinopathy (DR), in the UKB study.</p><p><strong>Results: </strong>We identified 49 049 participants with diabetes. Among them, 1023 had type 1 diabetes (T1D), and 40 193 had type 2 diabetes (T2D). A total of 23 833 diabetes subjects had linked primary care records. There were 3237, 3113, and 4922 patients with CVD, DKD, and DR events, respectively. The risk prediction performance for each outcome was assessed, and our results are consistent with the prediction area under the ROC (receiver operating characteristic) curve (AUC) of standard risk prediction models using cohort studies.</p><p><strong>Discussion and conclusion: </strong>Our publicly available pipeline and platform enable streamlined curation of incidence events, identification of time-varying risk factors underlying disease progression, and the definition of a relevant cohort for time-to-event analyses. These important steps need to be considered simultaneously to study disease progression.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 1","pages":"ooad006"},"PeriodicalIF":2.5000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9912368/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooad006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 1

Abstract

Objective: Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case-control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-varying risk factors that drive them.

Materials and methods: We developed an algorithmic approach to phenotyping the incidence of diseases by consolidating data sources from the UK Biobank (UKB), including primary care electronic health records (EHRs). We focused on defining events, event dates, and their censoring time, including relevant terms and existing phenotypes, excluding generic, rare, or semantically distant terms, forward-mapping terminology terms, and expert review. We applied our approach to phenotyping diabetes complications, including a composite cardiovascular disease (CVD) outcome, diabetic kidney disease (DKD), and diabetic retinopathy (DR), in the UKB study.

Results: We identified 49 049 participants with diabetes. Among them, 1023 had type 1 diabetes (T1D), and 40 193 had type 2 diabetes (T2D). A total of 23 833 diabetes subjects had linked primary care records. There were 3237, 3113, and 4922 patients with CVD, DKD, and DR events, respectively. The risk prediction performance for each outcome was assessed, and our results are consistent with the prediction area under the ROC (receiver operating characteristic) curve (AUC) of standard risk prediction models using cohort studies.

Discussion and conclusion: Our publicly available pipeline and platform enable streamlined curation of incidence events, identification of time-varying risk factors underlying disease progression, and the definition of a relevant cohort for time-to-event analyses. These important steps need to be considered simultaneously to study disease progression.

Abstract Image

Abstract Image

Abstract Image

大规模电子病历中疾病进展和相关纵向危险因素的表型分析平台,并应用于英国生物银行的糖尿病并发症。
目的:现代医疗数据反映了多年积累的海量多层次、多尺度信息。大多数现有的表现型算法使用病例对照的疾病定义。本文旨在研究疾病发生和进展的时间,并确定驱动它们的时变危险因素。材料和方法:我们通过整合来自英国生物银行(UKB)的数据源,包括初级保健电子健康记录(EHRs),开发了一种算法方法来对疾病发病率进行表型分析。我们专注于定义事件、事件日期及其审查时间,包括相关术语和现有表型,排除通用、罕见或语义上遥远的术语、前向映射术语和专家审查。在UKB研究中,我们将我们的方法应用于糖尿病并发症的表型分析,包括复合心血管疾病(CVD)结果、糖尿病肾病(DKD)和糖尿病视网膜病变(DR)。结果:我们确定了49049名糖尿病患者。其中1型糖尿病(T1D) 1023例,2型糖尿病(T2D) 40193例。共有23833名糖尿病患者有相关的初级保健记录。分别有3237、3113和4922例患者发生CVD、DKD和DR事件。对每个结果的风险预测性能进行了评估,我们的结果与使用队列研究的标准风险预测模型的ROC(受试者工作特征)曲线(AUC)下的预测面积一致。讨论和结论:我们公开的管道和平台能够简化发病率事件的管理,识别疾病进展的时变风险因素,并定义相关队列进行时间-事件分析。研究疾病进展需要同时考虑这些重要步骤。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信