在人口健康研究中对缺失的医疗保健系统数据进行序列热甲板估算。

IF 3.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Medical Care Pub Date : 2024-05-01 Epub Date: 2024-03-28 DOI:10.1097/MLR.0000000000001995

Ella A Chrenka, Steven P Dehmer, Michael V Maciosek, Inih J Essien, Bjorn C Westgard

{"title":"在人口健康研究中对缺失的医疗保健系统数据进行序列热甲板估算。","authors":"Ella A Chrenka, Steven P Dehmer, Michael V Maciosek, Inih J Essien, Bjorn C Westgard","doi":"10.1097/MLR.0000000000001995","DOIUrl":null,"url":null,"abstract":"Electronic medical record (EMR) data present many opportunities for population health research. The use of EMR data for population risk models can be impeded by the high proportion of missingness in key patient variables. Common approaches like complete case analysis and multiple imputation may not be appropriate for some population health initiatives that require a single, complete analytic data set. In this study, we demonstrate a sequential hot-deck imputation (HDI) procedure to address missingness in a set of cardiometabolic measures in an EMR data set. We assessed the performance of sequential HDI within the individual variables and a commonly used composite risk score. A data set of cardiometabolic measures based on EMR data from 2 large urban hospitals was used to create a benchmark data set with simulated missingness. Sequential HDI was applied, and the resulting data were used to calculate atherosclerotic cardiovascular disease risk scores. The performance of the imputation approach was assessed using a set of metrics to evaluate the distribution and validity of the imputed data. Of the 567,841 patients, 65% had at least 1 missing cardiometabolic measure. Sequential HDI resulted in the distribution of variables and risk scores that reflected those in the simulated data while retaining correlation. When stratified by age and sex, risk scores were plausible and captured patterns expected in the general population. The use of sequential HDI was shown to be a suitable approach to multivariate missingness in EMR data. Sequential HDI could benefit population health research by providing a straightforward, computationally nonintensive approach to missing EMR data that results in a single analytic data set.","PeriodicalId":18364,"journal":{"name":"Medical Care","volume":" ","pages":"319-325"},"PeriodicalIF":3.3000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10997447/pdf/","citationCount":"0","resultStr":"{\"title\":\"Use of Sequential Hot-Deck Imputation for Missing Health Care Systems Data for Population Health Research.\",\"authors\":\"Ella A Chrenka, Steven P Dehmer, Michael V Maciosek, Inih J Essien, Bjorn C Westgard\",\"doi\":\"10.1097/MLR.0000000000001995\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Electronic medical record (EMR) data present many opportunities for population health research. The use of EMR data for population risk models can be impeded by the high proportion of missingness in key patient variables. Common approaches like complete case analysis and multiple imputation may not be appropriate for some population health initiatives that require a single, complete analytic data set. In this study, we demonstrate a sequential hot-deck imputation (HDI) procedure to address missingness in a set of cardiometabolic measures in an EMR data set. We assessed the performance of sequential HDI within the individual variables and a commonly used composite risk score. A data set of cardiometabolic measures based on EMR data from 2 large urban hospitals was used to create a benchmark data set with simulated missingness. Sequential HDI was applied, and the resulting data were used to calculate atherosclerotic cardiovascular disease risk scores. The performance of the imputation approach was assessed using a set of metrics to evaluate the distribution and validity of the imputed data. Of the 567,841 patients, 65% had at least 1 missing cardiometabolic measure. Sequential HDI resulted in the distribution of variables and risk scores that reflected those in the simulated data while retaining correlation. When stratified by age and sex, risk scores were plausible and captured patterns expected in the general population. The use of sequential HDI was shown to be a suitable approach to multivariate missingness in EMR data. Sequential HDI could benefit population health research by providing a straightforward, computationally nonintensive approach to missing EMR data that results in a single analytic data set.\",\"PeriodicalId\":18364,\"journal\":{\"name\":\"Medical Care\",\"volume\":\" \",\"pages\":\"319-325\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10997447/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Care\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/MLR.0000000000001995\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/3/28 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/MLR.0000000000001995","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/28 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

电子病历（EMR）数据为人口健康研究提供了许多机会。由于关键患者变量的遗漏比例较高，因此在人口风险模型中使用电子病历数据可能会受到阻碍。对于某些需要单一、完整分析数据集的人群健康计划来说，完整病例分析和多重估算等常见方法可能并不合适。在本研究中，我们展示了一种顺序热甲板归因（HDI）程序，用于解决 EMR 数据集中一组心脏代谢指标的缺失问题。我们评估了连续 HDI 在单个变量和常用综合风险评分中的表现。我们使用了基于两家大型城市医院 EMR 数据的心脏代谢指标数据集来创建模拟缺失的基准数据集。应用序列 HDI，所得数据用于计算动脉粥样硬化性心血管疾病风险评分。使用一组指标评估了估算方法的性能，以评价估算数据的分布和有效性。在 567,841 名患者中，65% 的患者至少有一项心血管代谢指标缺失。顺序 HDI 使变量和风险评分的分布反映了模拟数据的分布，同时保留了相关性。按年龄和性别分层后，风险评分是合理的，并反映了普通人群的预期模式。结果表明，使用序列 HDI 是解决 EMR 数据中多变量缺失的一种合适方法。序列式 HDI 为缺失的 EMR 数据提供了一种直接、计算不密集的方法，可产生单一的分析数据集，从而有利于人口健康研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Use of Sequential Hot-Deck Imputation for Missing Health Care Systems Data for Population Health Research.

Electronic medical record (EMR) data present many opportunities for population health research. The use of EMR data for population risk models can be impeded by the high proportion of missingness in key patient variables. Common approaches like complete case analysis and multiple imputation may not be appropriate for some population health initiatives that require a single, complete analytic data set. In this study, we demonstrate a sequential hot-deck imputation (HDI) procedure to address missingness in a set of cardiometabolic measures in an EMR data set. We assessed the performance of sequential HDI within the individual variables and a commonly used composite risk score. A data set of cardiometabolic measures based on EMR data from 2 large urban hospitals was used to create a benchmark data set with simulated missingness. Sequential HDI was applied, and the resulting data were used to calculate atherosclerotic cardiovascular disease risk scores. The performance of the imputation approach was assessed using a set of metrics to evaluate the distribution and validity of the imputed data. Of the 567,841 patients, 65% had at least 1 missing cardiometabolic measure. Sequential HDI resulted in the distribution of variables and risk scores that reflected those in the simulated data while retaining correlation. When stratified by age and sex, risk scores were plausible and captured patterns expected in the general population. The use of sequential HDI was shown to be a suitable approach to multivariate missingness in EMR data. Sequential HDI could benefit population health research by providing a straightforward, computationally nonintensive approach to missing EMR data that results in a single analytic data set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical Care 医学-公共卫生、环境卫生与职业卫生

CiteScore

5.20

自引率

3.30%

发文量

228

审稿时长

3-8 weeks

期刊介绍： Rated as one of the top ten journals in healthcare administration, Medical Care is devoted to all aspects of the administration and delivery of healthcare. This scholarly journal publishes original, peer-reviewed papers documenting the most current developments in the rapidly changing field of healthcare. This timely journal reports on the findings of original investigations into issues related to the research, planning, organization, financing, provision, and evaluation of health services.