Data Cleaning in the Evaluation of a Multi-Site Intervention Project.

EGEMS (Washington, DC) Pub Date : 2017-12-15 DOI:10.5334/egems.196

Gavin Welch, Friedrich von Recklinghausen, Andreas Taenzer, Lucy Savitz, Lisa Weiss

{"title":"Data Cleaning in the Evaluation of a Multi-Site Intervention Project.","authors":"Gavin Welch, Friedrich von Recklinghausen, Andreas Taenzer, Lucy Savitz, Lisa Weiss","doi":"10.5334/egems.196","DOIUrl":null,"url":null,"abstract":"Context: The High Value Healthcare Collaborative (HVHC) sepsis project was a two-year multi-site project where Member health care delivery systems worked on improving sepsis care using a dissemination & implementation framework designed by HVHC. As part of the project evaluation, participating Members provided 5 data submissions over the project period. Members created data files using a uniform specification, but the data sources and methods used to create the data sets differed. Extensive data cleaning was necessary to get a data set usable for the evaluation analysis.Case description: HVHC was the coordinating center for the project and received and cleaned all data submissions. Submissions received 3 sequentially more detailed levels of checking by HVHC. The most detailed level evaluated validity by comparing values within-Member over time and between Member. For a subset of episodes Member-submitted data were compared to matched Medicare claims data.Findings: Inconsistencies in data submissions, particularly for length-of-stay variables were common in early submissions and decreased with subsequent submissions. Multiple resubmissions were sometimes required to get clean data. Data checking also uncovered a systematic difference in the way Medicare and some members defined intensive care unit stay.Conclusions: Data checking is a critical for ensuring valid analytic results for projects using electronic health record data. It is important to budget sufficient resources for data checking. Interim data submissions and checks help find anomalies early. Data resubmissions should be checked as fixes can introduce new errors. Communicating with those responsible for creating the data set provides critical information.","PeriodicalId":72880,"journal":{"name":"EGEMS (Washington, DC)","volume":"5 3","pages":"4"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5983076/pdf/","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EGEMS (Washington, DC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5334/egems.196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Context: The High Value Healthcare Collaborative (HVHC) sepsis project was a two-year multi-site project where Member health care delivery systems worked on improving sepsis care using a dissemination & implementation framework designed by HVHC. As part of the project evaluation, participating Members provided 5 data submissions over the project period. Members created data files using a uniform specification, but the data sources and methods used to create the data sets differed. Extensive data cleaning was necessary to get a data set usable for the evaluation analysis.

Case description: HVHC was the coordinating center for the project and received and cleaned all data submissions. Submissions received 3 sequentially more detailed levels of checking by HVHC. The most detailed level evaluated validity by comparing values within-Member over time and between Member. For a subset of episodes Member-submitted data were compared to matched Medicare claims data.

Findings: Inconsistencies in data submissions, particularly for length-of-stay variables were common in early submissions and decreased with subsequent submissions. Multiple resubmissions were sometimes required to get clean data. Data checking also uncovered a systematic difference in the way Medicare and some members defined intensive care unit stay.

Conclusions: Data checking is a critical for ensuring valid analytic results for projects using electronic health record data. It is important to budget sufficient resources for data checking. Interim data submissions and checks help find anomalies early. Data resubmissions should be checked as fixes can introduce new errors. Communicating with those responsible for creating the data set provides critical information.

Abstract Image

查看原文本刊更多论文

多站点干预项目评估中的数据清理。

背景:高价值医疗保健协作(HVHC)败血症项目是一个为期两年的多站点项目，成员医疗保健提供系统使用HVHC设计的传播和实施框架致力于改善败血症护理。作为项目评估的一部分，参与成员在项目期间提交了5份数据。成员使用统一的规范创建数据文件，但是用于创建数据集的数据源和方法不同。为了获得可用于评估分析的数据集，需要进行大量的数据清理。案例描述:HVHC是该项目的协调中心，负责接收和清理所有提交的数据。HVHC对提交的作品依次进行了3次更详细的检查。最详细的级别通过比较成员内部和成员之间的值来评估有效性。对于一部分患者提交的数据与匹配的医疗保险索赔数据进行比较。发现:数据提交的不一致，特别是在停留时间变量方面，在早期提交中很常见，随着后续提交而减少。有时需要多次重新提交以获得干净的数据。数据检查还发现，医疗保险和一些成员在定义重症监护病房的方式上存在系统性差异。结论:数据检查对于确保使用电子健康记录数据的项目的有效分析结果至关重要。为数据检查预算足够的资源是很重要的。中期数据提交和检查有助于及早发现异常。应该检查重新提交的数据，因为修复可能会引入新的错误。与负责创建数据集的人员进行通信可以提供关键信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

EGEMS (Washington, DC)

自引率

0.00%

发文量