Keith Marsolo, Lesley Curtis, Laura Qualls, Jennifer Xu, Yinghong Zhang, Thomas Phillips, C. Larry Hill, Gretchen Sanders, Judith C. Maro, Daniel Kiernan, Christine Draper, Kevin Coughlin, Sarah K. Dutcher, José J. Hernández-Muñoz, Monique Falconer
{"title":"通过数据来源评估结构化电子健康记录数据与参考术语和数据完整性的协调性","authors":"Keith Marsolo, Lesley Curtis, Laura Qualls, Jennifer Xu, Yinghong Zhang, Thomas Phillips, C. Larry Hill, Gretchen Sanders, Judith C. Maro, Daniel Kiernan, Christine Draper, Kevin Coughlin, Sarah K. Dutcher, José J. Hernández-Muñoz, Monique Falconer","doi":"10.1002/lrh2.10468","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>(1) Assess the harmonization of structured electronic health record data (laboratory results and medications) to reference terminologies and characterize the severity of issues. (2) Identify issues of data completeness by comparing complementary data domains, stratifying by time, care setting, and provenance.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Queries were distributed to 3 Data Partners (DP). Using harmonization queries, we examined the top 200 laboratory results and medications by volume, identifying outliers and computing summary statistics. The completeness queries looked at 4 conditions of interest and related clinical concepts. Counts were generated for each condition, stratified by year, encounter type, and provenance. We analyzed trends over time within and across DPs.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>We found that the median number of codes associated with a given laboratory/medication name (and vice versa) generally met expectations, though there were DP-specific issues that resulted in outliers. In addition, there were drastic differences in the percentage of patients with a given concept depending on provenance.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>The harmonization queries surfaced several mapping errors, as well as issues with overly specific codes and records with “null” codes. The completeness queries demonstrated having access to multiple types of data provenance provides more robust results compared with any single provenance type. Harmonization errors between source data and reference terminologies may not be widespread but do exist within CDMs, affecting tens of thousands or even millions of records. Provenance information can help identify potential completeness issues with EHR data, but only if it is represented in the CDM and then populated by DPs.</p>\n </section>\n </div>","PeriodicalId":43916,"journal":{"name":"Learning Health Systems","volume":"9 2","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/lrh2.10468","citationCount":"0","resultStr":"{\"title\":\"Assessing the harmonization of structured electronic health record data to reference terminologies and data completeness through data provenance\",\"authors\":\"Keith Marsolo, Lesley Curtis, Laura Qualls, Jennifer Xu, Yinghong Zhang, Thomas Phillips, C. Larry Hill, Gretchen Sanders, Judith C. Maro, Daniel Kiernan, Christine Draper, Kevin Coughlin, Sarah K. Dutcher, José J. Hernández-Muñoz, Monique Falconer\",\"doi\":\"10.1002/lrh2.10468\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Introduction</h3>\\n \\n <p>(1) Assess the harmonization of structured electronic health record data (laboratory results and medications) to reference terminologies and characterize the severity of issues. (2) Identify issues of data completeness by comparing complementary data domains, stratifying by time, care setting, and provenance.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Queries were distributed to 3 Data Partners (DP). Using harmonization queries, we examined the top 200 laboratory results and medications by volume, identifying outliers and computing summary statistics. The completeness queries looked at 4 conditions of interest and related clinical concepts. Counts were generated for each condition, stratified by year, encounter type, and provenance. We analyzed trends over time within and across DPs.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>We found that the median number of codes associated with a given laboratory/medication name (and vice versa) generally met expectations, though there were DP-specific issues that resulted in outliers. In addition, there were drastic differences in the percentage of patients with a given concept depending on provenance.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>The harmonization queries surfaced several mapping errors, as well as issues with overly specific codes and records with “null” codes. The completeness queries demonstrated having access to multiple types of data provenance provides more robust results compared with any single provenance type. Harmonization errors between source data and reference terminologies may not be widespread but do exist within CDMs, affecting tens of thousands or even millions of records. Provenance information can help identify potential completeness issues with EHR data, but only if it is represented in the CDM and then populated by DPs.</p>\\n </section>\\n </div>\",\"PeriodicalId\":43916,\"journal\":{\"name\":\"Learning Health Systems\",\"volume\":\"9 2\",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/lrh2.10468\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Learning Health Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/lrh2.10468\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH POLICY & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Learning Health Systems","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/lrh2.10468","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH POLICY & SERVICES","Score":null,"Total":0}
Assessing the harmonization of structured electronic health record data to reference terminologies and data completeness through data provenance
Introduction
(1) Assess the harmonization of structured electronic health record data (laboratory results and medications) to reference terminologies and characterize the severity of issues. (2) Identify issues of data completeness by comparing complementary data domains, stratifying by time, care setting, and provenance.
Methods
Queries were distributed to 3 Data Partners (DP). Using harmonization queries, we examined the top 200 laboratory results and medications by volume, identifying outliers and computing summary statistics. The completeness queries looked at 4 conditions of interest and related clinical concepts. Counts were generated for each condition, stratified by year, encounter type, and provenance. We analyzed trends over time within and across DPs.
Results
We found that the median number of codes associated with a given laboratory/medication name (and vice versa) generally met expectations, though there were DP-specific issues that resulted in outliers. In addition, there were drastic differences in the percentage of patients with a given concept depending on provenance.
Conclusions
The harmonization queries surfaced several mapping errors, as well as issues with overly specific codes and records with “null” codes. The completeness queries demonstrated having access to multiple types of data provenance provides more robust results compared with any single provenance type. Harmonization errors between source data and reference terminologies may not be widespread but do exist within CDMs, affecting tens of thousands or even millions of records. Provenance information can help identify potential completeness issues with EHR data, but only if it is represented in the CDM and then populated by DPs.