Validation of algorithms in studies based on routinely collected health data: general principles.

IF 5 2区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Vera Ehrenstein, Maja Hellfritzsch, Johnny Kahlert, Sinéad M Langan, Hisashi Urushihara, Danica Marinac-Dabic, Jennifer L Lund, Henrik Toft Sørensen, Eric I Benchimol
{"title":"Validation of algorithms in studies based on routinely collected health data: general principles.","authors":"Vera Ehrenstein, Maja Hellfritzsch, Johnny Kahlert, Sinéad M Langan, Hisashi Urushihara, Danica Marinac-Dabic, Jennifer L Lund, Henrik Toft Sørensen, Eric I Benchimol","doi":"10.1093/aje/kwae071","DOIUrl":null,"url":null,"abstract":"<p><p>Clinicians, researchers, regulators, and other decision-makers increasingly rely on evidence from real-world data (RWD), including data routinely accumulating in health and administrative databases. RWD studies often rely on algorithms to operationalize variable definitions. An algorithm is a combination of codes or concepts used to identify persons with a specific health condition or characteristic. Establishing the validity of algorithms is a prerequisite for generating valid study findings that can ultimately inform evidence-based health care. In this paper, we aim to systematize terminology, methods, and practical considerations relevant to the conduct of validation studies of RWD-based algorithms. We discuss measures of algorithm accuracy, gold/reference standards, study size, prioritization of accuracy measures, algorithm portability, and implications for interpretation. Information bias is common in epidemiologic studies, underscoring the importance of transparency in decisions regarding choice and prioritizing measures of algorithm validity. The validity of an algorithm should be judged in the context of a data source, and one size does not fit all. Prioritizing validity measures within a given data source depends on the role of a given variable in the analysis (eligibility criterion, exposure, outcome, or covariate). Validation work should be part of routine maintenance of RWD sources. This article is part of a Special Collection on Pharmacoepidemiology.</p>","PeriodicalId":7472,"journal":{"name":"American journal of epidemiology","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/aje/kwae071","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Clinicians, researchers, regulators, and other decision-makers increasingly rely on evidence from real-world data (RWD), including data routinely accumulating in health and administrative databases. RWD studies often rely on algorithms to operationalize variable definitions. An algorithm is a combination of codes or concepts used to identify persons with a specific health condition or characteristic. Establishing the validity of algorithms is a prerequisite for generating valid study findings that can ultimately inform evidence-based health care. In this paper, we aim to systematize terminology, methods, and practical considerations relevant to the conduct of validation studies of RWD-based algorithms. We discuss measures of algorithm accuracy, gold/reference standards, study size, prioritization of accuracy measures, algorithm portability, and implications for interpretation. Information bias is common in epidemiologic studies, underscoring the importance of transparency in decisions regarding choice and prioritizing measures of algorithm validity. The validity of an algorithm should be judged in the context of a data source, and one size does not fit all. Prioritizing validity measures within a given data source depends on the role of a given variable in the analysis (eligibility criterion, exposure, outcome, or covariate). Validation work should be part of routine maintenance of RWD sources. This article is part of a Special Collection on Pharmacoepidemiology.

在基于常规收集的健康数据的研究中验证算法:一般原则。
临床医生、研究人员、监管机构和其他决策者越来越依赖于来自真实世界数据(RWD)的证据,包括卫生和行政数据库中日常积累的数据。RWD 研究通常依靠算法来实现变量定义的可操作性。算法是代码或概念的组合,用于识别具有特定健康状况或特征的人。建立算法的有效性是产生有效研究结果的前提,而有效研究结果最终可为循证医疗提供依据。本文旨在系统阐述与基于 RWD 算法的验证研究相关的术语、方法和实际注意事项。我们讨论了算法准确性的衡量标准、黄金标准/参考标准、研究规模、准确性衡量标准的优先级、算法的可移植性以及对解释的影响。信息偏差在流行病学研究中很常见,这就强调了在选择算法有效性衡量标准和确定其优先次序时透明度的重要性。算法的有效性应根据数据源来判断,不能一刀切。在给定的数据源中确定有效性措施的优先次序取决于给定变量在分析中的作用(合格标准、暴露、结果或协变因素)。验证工作应成为 RWD 数据源日常维护的一部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
American journal of epidemiology
American journal of epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
7.40
自引率
4.00%
发文量
221
审稿时长
3-6 weeks
期刊介绍: The American Journal of Epidemiology is the oldest and one of the premier epidemiologic journals devoted to the publication of empirical research findings, opinion pieces, and methodological developments in the field of epidemiologic research. It is a peer-reviewed journal aimed at both fellow epidemiologists and those who use epidemiologic data, including public health workers and clinicians.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信