Assessing quality and agreement of structured data in automatic versus manual abstraction of the electronic health record for a clinical epidemiology study
J. G. Brazeal, A. Alekseyenko, Hong Li, M. Fugal, K. Kirchoff, Courtney H. Marsh, D. Lewin, Jennifer D. Wu, J. Obeid, Kristin Wallace
{"title":"Assessing quality and agreement of structured data in automatic versus manual abstraction of the electronic health record for a clinical epidemiology study","authors":"J. G. Brazeal, A. Alekseyenko, Hong Li, M. Fugal, K. Kirchoff, Courtney H. Marsh, D. Lewin, Jennifer D. Wu, J. Obeid, Kristin Wallace","doi":"10.1177/26320843211061287","DOIUrl":null,"url":null,"abstract":"Objective We evaluate data agreement between an electronic health record (EHR) sample abstracted by automated characterization with a standard abstracted by manual review. Study Design and Setting We obtain data for an epidemiology cohort study using standard manual abstraction of the EHR and automated identification of the same patients using a structured algorithm to query the EHR. Summary measures of agreement (e.g., Cohen’s kappa) are reported for 12 variables commonly used in epidemiological studies. Results Best agreement between abstraction methods is observed among demographic characteristics such as age, sex, and race, and for positive history of disease. Poor agreement is found in missing data and negative history, suggesting potential impact for researchers using automated EHR characterization. EHR data quality depends upon providers, who may be influenced by both institutional and federal government documentation guidelines. Conclusion Automated EHR abstraction discrepancies may decrease power and increase bias; therefore, caution is warranted when selecting variables from EHRs for epidemiological study using an automated characterization approach. Validation of automated methods must also continue to advance in sophistication with other technologies, such as machine learning and natural language processing, to extract non-structured data from the EHR, for application to EHR characterization for clinical epidemiology.","PeriodicalId":74683,"journal":{"name":"Research methods in medicine & health sciences","volume":"2 1","pages":"168 - 178"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research methods in medicine & health sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/26320843211061287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Objective We evaluate data agreement between an electronic health record (EHR) sample abstracted by automated characterization with a standard abstracted by manual review. Study Design and Setting We obtain data for an epidemiology cohort study using standard manual abstraction of the EHR and automated identification of the same patients using a structured algorithm to query the EHR. Summary measures of agreement (e.g., Cohen’s kappa) are reported for 12 variables commonly used in epidemiological studies. Results Best agreement between abstraction methods is observed among demographic characteristics such as age, sex, and race, and for positive history of disease. Poor agreement is found in missing data and negative history, suggesting potential impact for researchers using automated EHR characterization. EHR data quality depends upon providers, who may be influenced by both institutional and federal government documentation guidelines. Conclusion Automated EHR abstraction discrepancies may decrease power and increase bias; therefore, caution is warranted when selecting variables from EHRs for epidemiological study using an automated characterization approach. Validation of automated methods must also continue to advance in sophistication with other technologies, such as machine learning and natural language processing, to extract non-structured data from the EHR, for application to EHR characterization for clinical epidemiology.