Error rates of data processing methods in clinical research: A systematic review and meta-analysis of manuscripts identified through PubMed

IF 3.7 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Maryam Y. Garza , Tremaine Williams , Songthip Ounpraseuth , Zhuopei Hu , Jeannette Lee , Jessica Snowden , Anita C. Walden , Alan E. Simon , Lori A. Devlin , Leslie W. Young , Meredith N. Zozus
{"title":"Error rates of data processing methods in clinical research: A systematic review and meta-analysis of manuscripts identified through PubMed","authors":"Maryam Y. Garza ,&nbsp;Tremaine Williams ,&nbsp;Songthip Ounpraseuth ,&nbsp;Zhuopei Hu ,&nbsp;Jeannette Lee ,&nbsp;Jessica Snowden ,&nbsp;Anita C. Walden ,&nbsp;Alan E. Simon ,&nbsp;Lori A. Devlin ,&nbsp;Leslie W. Young ,&nbsp;Meredith N. Zozus","doi":"10.1016/j.ijmedinf.2024.105749","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>In clinical research, prevention of data errors is paramount to ensuring reproducibility of trial results and the safety and efficacy of the resulting interventions. Over the last 40 years, empirical assessments of data accuracy in clinical research have been reported, however, there has been little systematic synthesis of these results. Although notable exceptions exist, little evidence exists regarding the relative accuracy of different data processing methods.</div></div><div><h3>Methods</h3><div>A systematic review of the literature identified through PubMed was performed to identify studies that evaluated the quality of data obtained through data processing methods typically used in clinical trials. Quantitative information on data accuracy was abstracted from the manuscripts and pooled. Meta-analysis of single proportions based on the Freeman-Tukey transformation method and the generalized linear mixed model approach were used to derive an overall estimate of error rates across data processing methods used in each study for comparison.</div></div><div><h3>Results</h3><div>A total of 93 papers (published from 1978 to 2008) meeting our inclusion criteria were categorized according to their data processing methods. The accuracy associated with data processing methods varied widely, with error rates ranging from 2 errors per 10,000 fields to 2,784 errors per 10,000 fields. MRA was associated with both high and highly variable error rates, having a pooled error rate of 6.57% (95% CI: 5.51, 7.72). In comparison, the pooled error rates for optical scanning, single-data entry, and double-data entry methods were 0.74% (0.21, 1.60), 0.29% (0.24, 0.35) and 0.14% (0.08, 0.20), respectively.</div></div><div><h3>Conclusions</h3><div>Data processing methods may explain a significant amount of the variability in data accuracy. MRA error rates, for example, were high enough to impact decisions made using the data and could necessitate increases in sample sizes to preserve statistical power. Thus, the choice of data processing methods can likely impact process capability and, ultimately, the validity of trial results.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105749"},"PeriodicalIF":3.7000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S138650562400412X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Background

In clinical research, prevention of data errors is paramount to ensuring reproducibility of trial results and the safety and efficacy of the resulting interventions. Over the last 40 years, empirical assessments of data accuracy in clinical research have been reported, however, there has been little systematic synthesis of these results. Although notable exceptions exist, little evidence exists regarding the relative accuracy of different data processing methods.

Methods

A systematic review of the literature identified through PubMed was performed to identify studies that evaluated the quality of data obtained through data processing methods typically used in clinical trials. Quantitative information on data accuracy was abstracted from the manuscripts and pooled. Meta-analysis of single proportions based on the Freeman-Tukey transformation method and the generalized linear mixed model approach were used to derive an overall estimate of error rates across data processing methods used in each study for comparison.

Results

A total of 93 papers (published from 1978 to 2008) meeting our inclusion criteria were categorized according to their data processing methods. The accuracy associated with data processing methods varied widely, with error rates ranging from 2 errors per 10,000 fields to 2,784 errors per 10,000 fields. MRA was associated with both high and highly variable error rates, having a pooled error rate of 6.57% (95% CI: 5.51, 7.72). In comparison, the pooled error rates for optical scanning, single-data entry, and double-data entry methods were 0.74% (0.21, 1.60), 0.29% (0.24, 0.35) and 0.14% (0.08, 0.20), respectively.

Conclusions

Data processing methods may explain a significant amount of the variability in data accuracy. MRA error rates, for example, were high enough to impact decisions made using the data and could necessitate increases in sample sizes to preserve statistical power. Thus, the choice of data processing methods can likely impact process capability and, ultimately, the validity of trial results.

Abstract Image

临床研究中数据处理方法的错误率:通过PubMed识别的手稿的系统回顾和荟萃分析。
背景:在临床研究中,预防数据错误对于确保试验结果的可重复性以及由此产生的干预措施的安全性和有效性至关重要。在过去的40年里,对临床研究中数据准确性的实证评估已经有了报道,然而,对这些结果的系统综合却很少。尽管存在明显的例外,但很少有证据表明不同数据处理方法的相对准确性。方法:通过PubMed对文献进行系统回顾,以确定通过临床试验中通常使用的数据处理方法评估数据质量的研究。从手稿中提取数据准确性的定量信息并汇总。使用基于Freeman-Tukey变换方法和广义线性混合模型方法的单一比例元分析,得出每个研究中使用的数据处理方法的误差率的总体估计,以进行比较。结果:根据数据处理方法,共纳入93篇符合纳入标准的论文(发表于1978 - 2008年)。与数据处理方法相关的准确性差异很大,错误率从每10,000个字段2个错误到每10,000个字段2,784个错误不等。MRA与高错误率和高可变错误率相关,总错误率为6.57% (95% CI: 5.51, 7.72)。相比之下,光学扫描、单数据录入和双数据录入的汇总错误率分别为0.74%(0.21,1.60)、0.29%(0.24,0.35)和0.14%(0.08,0.20)。结论:数据处理方法可以解释数据准确性的显著差异。例如,MRA错误率高到足以影响使用数据做出的决策,可能需要增加样本量以保持统计效力。因此,数据处理方法的选择可能会影响处理能力,并最终影响试验结果的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信