Robustness of Multiple Imputation Methods for Missing Risk Factor Data from Electronic Medical Records for Observational Studies.

IF 5.9 Q1 Computer Science
Journal of Healthcare Informatics Research Pub Date : 2022-09-10 eCollection Date: 2022-12-01 DOI:10.1007/s41666-022-00119-w
Sanjoy K Paul, Joanna Ling, Mayukh Samanta, Olga Montvida
{"title":"Robustness of Multiple Imputation Methods for Missing Risk Factor Data from Electronic Medical Records for Observational Studies.","authors":"Sanjoy K Paul, Joanna Ling, Mayukh Samanta, Olga Montvida","doi":"10.1007/s41666-022-00119-w","DOIUrl":null,"url":null,"abstract":"<p><p>Evaluating appropriate methodologies for imputation of missing outcome data from electronic medical records (EMRs) is crucial but lacking for observational studies. Using US EMR in people with type 2 diabetes treated over 12 and 24 months with dipeptidyl peptidase 4 inhibitors (DPP-4i, <i>n</i> = 38,483) and glucagon-like peptide 1 receptor agonists (GLP-1RA, <i>n</i> = 8,977), predictors of missingness of disease biomarker (HbA1c) were explored. Robustness of multiple imputation (MI) by chained equations, two-fold MI (MI-2F) and MI with Monte Carlo Markov Chain were compared to complete case analyses for drawing inferences. Compared to younger people (age quartile Q1), those in age quartile Q3 and Q4 were less likely to have missing HbA1c by 25-32% (range of OR CI: 0.55-0.88) at 6-month follow-up and by 26-39% (range of OR CI: 0.50-0.80) at 12-month follow-up. People with HbA1c ≥ 7.5% at baseline were 12% (OR CI: 0.83, 0.93) and 14% (OR CI: 0.77, 0.97) less likely to have missing data at 6-month follow-up in the DPP-4i and GLP-1RA groups, respectively. All imputation methods provided similar HbA1c distributions during follow-up as observed with complete case analyses. The clinical inferences based on absolute change in HbA1c and by proportion of people reducing HbA1c to a clinically acceptable level (≤ 7%) were also similar between imputed data and complete case analyses. MI-2F method provided marginally smaller mean difference between observed and imputed data with relatively smaller standard error of difference, compared to other methods, while evaluating for consistency through artificial within-sample analyses. The established MI techniques can be reliably employed for missing outcome data imputations in large EMR-based relational databases, leading to efficiently designing and drawing robust clinical inferences in pharmaco-epidemiological studies.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s41666-022-00119-w.</p>","PeriodicalId":36444,"journal":{"name":"Journal of Healthcare Informatics Research","volume":null,"pages":null},"PeriodicalIF":5.9000,"publicationDate":"2022-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892403/pdf/","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41666-022-00119-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/12/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 2

Abstract

Evaluating appropriate methodologies for imputation of missing outcome data from electronic medical records (EMRs) is crucial but lacking for observational studies. Using US EMR in people with type 2 diabetes treated over 12 and 24 months with dipeptidyl peptidase 4 inhibitors (DPP-4i, n = 38,483) and glucagon-like peptide 1 receptor agonists (GLP-1RA, n = 8,977), predictors of missingness of disease biomarker (HbA1c) were explored. Robustness of multiple imputation (MI) by chained equations, two-fold MI (MI-2F) and MI with Monte Carlo Markov Chain were compared to complete case analyses for drawing inferences. Compared to younger people (age quartile Q1), those in age quartile Q3 and Q4 were less likely to have missing HbA1c by 25-32% (range of OR CI: 0.55-0.88) at 6-month follow-up and by 26-39% (range of OR CI: 0.50-0.80) at 12-month follow-up. People with HbA1c ≥ 7.5% at baseline were 12% (OR CI: 0.83, 0.93) and 14% (OR CI: 0.77, 0.97) less likely to have missing data at 6-month follow-up in the DPP-4i and GLP-1RA groups, respectively. All imputation methods provided similar HbA1c distributions during follow-up as observed with complete case analyses. The clinical inferences based on absolute change in HbA1c and by proportion of people reducing HbA1c to a clinically acceptable level (≤ 7%) were also similar between imputed data and complete case analyses. MI-2F method provided marginally smaller mean difference between observed and imputed data with relatively smaller standard error of difference, compared to other methods, while evaluating for consistency through artificial within-sample analyses. The established MI techniques can be reliably employed for missing outcome data imputations in large EMR-based relational databases, leading to efficiently designing and drawing robust clinical inferences in pharmaco-epidemiological studies.

Supplementary information: The online version contains supplementary material available at 10.1007/s41666-022-00119-w.

观察性研究中电子病历中缺失风险因素数据的多重归算方法的稳健性
评估电子病历(emr)中缺失结果数据的适当方法至关重要,但缺乏观察性研究。使用US EMR对二肽基肽酶4抑制剂(DPP-4i, n = 38,483)和胰高血糖素样肽1受体激动剂(GLP-1RA, n = 8,977)治疗超过12个月和24个月的2型糖尿病患者进行研究,探讨疾病生物标志物(HbA1c)缺失的预测因素。比较了链式方程的多重插值(MI)、二次插值(MI- 2f)和蒙特卡罗马尔可夫链的多重插值(MI)的鲁棒性,并进行了完整的案例分析,以得出结论。与年轻人(年龄四分位数Q1)相比,Q3和Q4年龄四分位数的HbA1c缺失的可能性在6个月随访时降低了25-32% (OR CI范围:0.55-0.88),在12个月随访时降低了26-39% (OR CI范围:0.50-0.80)。基线时HbA1c≥7.5%的患者在DPP-4i组和GLP-1RA组6个月随访时数据缺失的可能性分别降低了12% (OR CI: 0.83, 0.93)和14% (OR CI: 0.77, 0.97)。所有的归算方法在随访期间提供的HbA1c分布与完整的病例分析相似。基于HbA1c绝对变化和HbA1c降至临床可接受水平(≤7%)的患者比例的临床推断在输入数据和完整病例分析之间也相似。与其他方法相比,MI-2F方法提供的观测数据与输入数据的平均差值略小,差异的标准误差也相对较小,同时通过人工样本内分析来评估一致性。已建立的MI技术可以可靠地用于大型基于emr的关系数据库中缺失的结果数据输入,从而有效地设计和绘制药物流行病学研究中可靠的临床推断。补充信息:在线版本包含补充资料,可在10.1007/s41666-022-00119-w获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Healthcare Informatics Research
Journal of Healthcare Informatics Research Computer Science-Computer Science Applications
CiteScore
13.60
自引率
1.70%
发文量
12
期刊介绍: Journal of Healthcare Informatics Research serves as a publication venue for the innovative technical contributions highlighting analytics, systems, and human factors research in healthcare informatics.Journal of Healthcare Informatics Research is concerned with the application of computer science principles, information science principles, information technology, and communication technology to address problems in healthcare, and everyday wellness. Journal of Healthcare Informatics Research highlights the most cutting-edge technical contributions in computing-oriented healthcare informatics.  The journal covers three major tracks: (1) analytics—focuses on data analytics, knowledge discovery, predictive modeling; (2) systems—focuses on building healthcare informatics systems (e.g., architecture, framework, design, engineering, and application); (3) human factors—focuses on understanding users or context, interface design, health behavior, and user studies of healthcare informatics applications.   Topics include but are not limited to: ·         healthcare software architecture, framework, design, and engineering;·         electronic health records·         medical data mining·         predictive modeling·         medical information retrieval·         medical natural language processing·         healthcare information systems·         smart health and connected health·         social media analytics·         mobile healthcare·         medical signal processing·         human factors in healthcare·         usability studies in healthcare·         user-interface design for medical devices and healthcare software·         health service delivery·         health games·         security and privacy in healthcare·         medical recommender system·         healthcare workflow management·         disease profiling and personalized treatment·         visualization of medical data·         intelligent medical devices and sensors·         RFID solutions for healthcare·         healthcare decision analytics and support systems·         epidemiological surveillance systems and intervention modeling·         consumer and clinician health information needs, seeking, sharing, and use·         semantic Web, linked data, and ontology·         collaboration technologies for healthcare·         assistive and adaptive ubiquitous computing technologies·         statistics and quality of medical data·         healthcare delivery in developing countries·         health systems modeling and simulation·         computer-aided diagnosis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信