协调一个非常大的国家共用电子健康记录(EHR)数据集中定量数据元素的单位和值

K. Bradwell, J. Wooldridge, B. Amor, T. Bennett, A. Anand, C. Bremer, Y. J. Yoo, Zhenglong Qian, Steven G. Johnson, E. Pfaff, A. Girvin, A. Manna, Emily Niehaus, Stephanie S. Hong, X. Zhang, R. Zhu, Mark Bissell, N. Qureshi, J. Saltz, M. Haendel, C. Chute, H. Lehmann, R. Moffitt
{"title":"协调一个非常大的国家共用电子健康记录(EHR)数据集中定量数据元素的单位和值","authors":"K. Bradwell, J. Wooldridge, B. Amor, T. Bennett, A. Anand, C. Bremer, Y. J. Yoo, Zhenglong Qian, Steven G. Johnson, E. Pfaff, A. Girvin, A. Manna, Emily Niehaus, Stephanie S. Hong, X. Zhang, R. Zhu, Mark Bissell, N. Qureshi, J. Saltz, M. Haendel, C. Chute, H. Lehmann, R. Moffitt","doi":"10.1093/jamia/ocac054","DOIUrl":null,"url":null,"abstract":"Abstract Objective The goals of this study were to harmonize data from electronic health records (EHRs) into common units, and impute units that were missing. Materials and Methods The National COVID Cohort Collaborative (N3C) table of laboratory measurement data—over 3.1 billion patient records and over 19 000 unique measurement concepts in the Observational Medical Outcomes Partnership (OMOP) common-data-model format from 55 data partners. We grouped ontologically similar OMOP concepts together for 52 variables relevant to COVID-19 research, and developed a unit-harmonization pipeline comprised of (1) selecting a canonical unit for each measurement variable, (2) arriving at a formula for conversion, (3) obtaining clinical review of each formula, (4) applying the formula to convert data values in each unit into the target canonical unit, and (5) removing any harmonized value that fell outside of accepted value ranges for the variable. For data with missing units for all the results within a lab test for a data partner, we compared values with pooled values of all data partners, using the Kolmogorov-Smirnov test. Results Of the concepts without missing values, we harmonized 88.1% of the values, and imputed units for 78.2% of records where units were absent (41% of contributors’ records lacked units). Discussion The harmonization and inference methods developed herein can serve as a resource for initiatives aiming to extract insight from heterogeneous EHR collections. Unique properties of centralized data are harnessed to enable unit inference. Conclusion The pipeline we developed for the pooled N3C data enables use of measurements that would otherwise be unavailable for analysis.","PeriodicalId":236137,"journal":{"name":"Journal of the American Medical Informatics Association : JAMIA","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Harmonizing units and values of quantitative data elements in a very large nationally pooled electronic health record (EHR) dataset\",\"authors\":\"K. Bradwell, J. Wooldridge, B. Amor, T. Bennett, A. Anand, C. Bremer, Y. J. Yoo, Zhenglong Qian, Steven G. Johnson, E. Pfaff, A. Girvin, A. Manna, Emily Niehaus, Stephanie S. Hong, X. Zhang, R. Zhu, Mark Bissell, N. Qureshi, J. Saltz, M. Haendel, C. Chute, H. Lehmann, R. Moffitt\",\"doi\":\"10.1093/jamia/ocac054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Objective The goals of this study were to harmonize data from electronic health records (EHRs) into common units, and impute units that were missing. Materials and Methods The National COVID Cohort Collaborative (N3C) table of laboratory measurement data—over 3.1 billion patient records and over 19 000 unique measurement concepts in the Observational Medical Outcomes Partnership (OMOP) common-data-model format from 55 data partners. We grouped ontologically similar OMOP concepts together for 52 variables relevant to COVID-19 research, and developed a unit-harmonization pipeline comprised of (1) selecting a canonical unit for each measurement variable, (2) arriving at a formula for conversion, (3) obtaining clinical review of each formula, (4) applying the formula to convert data values in each unit into the target canonical unit, and (5) removing any harmonized value that fell outside of accepted value ranges for the variable. For data with missing units for all the results within a lab test for a data partner, we compared values with pooled values of all data partners, using the Kolmogorov-Smirnov test. Results Of the concepts without missing values, we harmonized 88.1% of the values, and imputed units for 78.2% of records where units were absent (41% of contributors’ records lacked units). Discussion The harmonization and inference methods developed herein can serve as a resource for initiatives aiming to extract insight from heterogeneous EHR collections. Unique properties of centralized data are harnessed to enable unit inference. Conclusion The pipeline we developed for the pooled N3C data enables use of measurements that would otherwise be unavailable for analysis.\",\"PeriodicalId\":236137,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association : JAMIA\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association : JAMIA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamia/ocac054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association : JAMIA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamia/ocac054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

摘要目的本研究的目的是将电子健康记录(EHRs)中的数据统一为通用单位,并计算缺失的单位。材料和方法国家COVID队列协作(N3C)实验室测量数据表-来自55个数据合作伙伴的观察性医疗结果合作伙伴关系(OMOP)公共数据模型格式的31亿多条患者记录和19000多个独特测量概念。我们将与COVID-19研究相关的52个变量的本体相似的OMOP概念分组在一起,并开发了一个单位协调管道,包括(1)为每个测量变量选择一个标准单位,(2)得出转换公式,(3)获得每个公式的临床审查,(4)应用公式将每个单位中的数据值转换为目标标准单位。(5)去除超出变量可接受值范围的任何协调值。对于数据伙伴的实验室测试中所有结果的缺失单位的数据,我们使用Kolmogorov-Smirnov检验将值与所有数据伙伴的汇总值进行比较。结果在没有缺失值的概念中,我们协调了88.1%的值,并为78.2%的缺少单位的记录输入了单位(41%的贡献者的记录缺少单位)。本文开发的协调和推理方法可以作为旨在从异构EHR集合中提取洞察力的计划的资源。利用集中数据的独特属性来实现单元推理。我们为汇集的N3C数据开发的管道可以使用否则无法用于分析的测量结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Harmonizing units and values of quantitative data elements in a very large nationally pooled electronic health record (EHR) dataset
Abstract Objective The goals of this study were to harmonize data from electronic health records (EHRs) into common units, and impute units that were missing. Materials and Methods The National COVID Cohort Collaborative (N3C) table of laboratory measurement data—over 3.1 billion patient records and over 19 000 unique measurement concepts in the Observational Medical Outcomes Partnership (OMOP) common-data-model format from 55 data partners. We grouped ontologically similar OMOP concepts together for 52 variables relevant to COVID-19 research, and developed a unit-harmonization pipeline comprised of (1) selecting a canonical unit for each measurement variable, (2) arriving at a formula for conversion, (3) obtaining clinical review of each formula, (4) applying the formula to convert data values in each unit into the target canonical unit, and (5) removing any harmonized value that fell outside of accepted value ranges for the variable. For data with missing units for all the results within a lab test for a data partner, we compared values with pooled values of all data partners, using the Kolmogorov-Smirnov test. Results Of the concepts without missing values, we harmonized 88.1% of the values, and imputed units for 78.2% of records where units were absent (41% of contributors’ records lacked units). Discussion The harmonization and inference methods developed herein can serve as a resource for initiatives aiming to extract insight from heterogeneous EHR collections. Unique properties of centralized data are harnessed to enable unit inference. Conclusion The pipeline we developed for the pooled N3C data enables use of measurements that would otherwise be unavailable for analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信