Automating Electronic Clinical Data Capture for Quality Improvement and Research: The CERTAIN Validation Project of Real World Evidence.

EGEMS (Washington, DC) Pub Date : 2018-05-22 DOI:10.5334/egems.211

Emily Beth Devine, Erik Van Eaton, Megan E Zadworny, Rebecca Symons, Allison Devlin, David Yanez, Meliha Yetisgen, Katelyn R Keyloun, Daniel Capurro, Rafael Alfonso-Cristancho, David R Flum, Peter Tarczy-Hornoch

{"title":"Automating Electronic Clinical Data Capture for Quality Improvement and Research: The CERTAIN Validation Project of Real World Evidence.","authors":"Emily Beth Devine, Erik Van Eaton, Megan E Zadworny, Rebecca Symons, Allison Devlin, David Yanez, Meliha Yetisgen, Katelyn R Keyloun, Daniel Capurro, Rafael Alfonso-Cristancho, David R Flum, Peter Tarczy-Hornoch","doi":"10.5334/egems.211","DOIUrl":null,"url":null,"abstract":"Background: The availability of high fidelity electronic health record (EHR) data is a hallmark of the learning health care system. Washington State's Surgical Care Outcomes and Assessment Program (SCOAP) is a network of hospitals participating in quality improvement (QI) registries wherein data are manually abstracted from EHRs. To create the Comparative Effectiveness Research and Translation Network (CERTAIN), we semi-automated SCOAP data abstraction using a centralized federated data model, created a central data repository (CDR), and assessed whether these data could be used as real world evidence for QI and research.Objectives: Describe the validation processes and complexities involved and lessons learned.Methods: Investigators installed a commercial CDR to retrieve and store data from disparate EHRs. Manual and automated abstraction systems were conducted in parallel (10/2012-7/2013) and validated in three phases using the EHR as the gold standard: 1) ingestion, 2) standardization, and 3) concordance of automated versus manually abstracted cases. Information retrieval statistics were calculated.Results: Four unaffiliated health systems provided data. Between 6 and 15 percent of data elements were abstracted: 51 to 86 percent from structured data; the remainder using natural language processing (NLP). In phase 1, data ingestion from 12 out of 20 feeds reached 95 percent accuracy. In phase 2, 55 percent of structured data elements performed with 96 to 100 percent accuracy; NLP with 89 to 91 percent accuracy. In phase 3, concordance ranged from 69 to 89 percent. Information retrieval statistics were consistently above 90 percent.Conclusions: Semi-automated data abstraction may be useful, although raw data collected as a byproduct of health care delivery is not immediately available for use as real world evidence. New approaches to gathering and analyzing extant data are required.","PeriodicalId":72880,"journal":{"name":"EGEMS (Washington, DC)","volume":" ","pages":"8"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5983060/pdf/","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EGEMS (Washington, DC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5334/egems.211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Background: The availability of high fidelity electronic health record (EHR) data is a hallmark of the learning health care system. Washington State's Surgical Care Outcomes and Assessment Program (SCOAP) is a network of hospitals participating in quality improvement (QI) registries wherein data are manually abstracted from EHRs. To create the Comparative Effectiveness Research and Translation Network (CERTAIN), we semi-automated SCOAP data abstraction using a centralized federated data model, created a central data repository (CDR), and assessed whether these data could be used as real world evidence for QI and research.

Objectives: Describe the validation processes and complexities involved and lessons learned.

Methods: Investigators installed a commercial CDR to retrieve and store data from disparate EHRs. Manual and automated abstraction systems were conducted in parallel (10/2012-7/2013) and validated in three phases using the EHR as the gold standard: 1) ingestion, 2) standardization, and 3) concordance of automated versus manually abstracted cases. Information retrieval statistics were calculated.

Results: Four unaffiliated health systems provided data. Between 6 and 15 percent of data elements were abstracted: 51 to 86 percent from structured data; the remainder using natural language processing (NLP). In phase 1, data ingestion from 12 out of 20 feeds reached 95 percent accuracy. In phase 2, 55 percent of structured data elements performed with 96 to 100 percent accuracy; NLP with 89 to 91 percent accuracy. In phase 3, concordance ranged from 69 to 89 percent. Information retrieval statistics were consistently above 90 percent.

Conclusions: Semi-automated data abstraction may be useful, although raw data collected as a byproduct of health care delivery is not immediately available for use as real world evidence. New approaches to gathering and analyzing extant data are required.

查看原文本刊更多论文

用于质量改进和研究的自动化电子临床数据采集:真实世界证据的特定验证项目。

背景:高保真电子健康记录(EHR)数据的可用性是学习型医疗保健系统的一个标志。华盛顿州的外科护理结果和评估计划(SCOAP)是一个参与质量改进(QI)注册的医院网络，其中的数据是手动从电子病历中提取的。为了创建比较有效性研究和翻译网络(CERTAIN)，我们使用集中式联邦数据模型对SCOAP数据抽象进行了半自动化，创建了一个中央数据存储库(CDR)，并评估了这些数据是否可以用作QI和研究的真实世界证据。目标:描述验证过程、复杂性和经验教训。方法:研究人员安装了商业CDR来检索和存储来自不同电子病历的数据。手动和自动抽象系统并行进行(2012年10月- 2013年7月)，并以EHR为金标准分三个阶段进行验证:1)摄取，2)标准化，3)自动与手动抽象案例的一致性。计算信息检索统计。结果:四个独立的卫生系统提供了数据。6%到15%的数据元素被抽象:51%到86%来自结构化数据;其余的使用自然语言处理(NLP)。在第一阶段，从20个提要中的12个提要中获取的数据达到了95%的准确率。在第二阶段，55%的结构化数据元素以96%到100%的准确率执行;NLP有89%到91%的准确率。在第三阶段，一致性从69%到89%不等。信息检索统计数据始终在90%以上。结论:半自动化的数据抽象可能是有用的，尽管作为卫生保健提供的副产品收集的原始数据不能立即用作现实世界的证据。需要收集和分析现有数据的新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

EGEMS (Washington, DC)

自引率

0.00%

发文量