Application of an Externally Developed Algorithm to Identify Research Cases and Controls from Electronic Health Record Data: Failures and Successes.

IF 2.1 2区 医学 Q4 MEDICAL INFORMATICS
Nelly Estefanie Garduno Rapp, Simone D Herzberg, Henry H Ong, Cindy Kao, Christoph Ulrich Lehmann, Srushti Gangireddy, Nitin B Jain, Ayush Giri
{"title":"Application of an Externally Developed Algorithm to Identify Research Cases and Controls from Electronic Health Record Data: Failures and Successes.","authors":"Nelly Estefanie Garduno Rapp, Simone D Herzberg, Henry H Ong, Cindy Kao, Christoph Ulrich Lehmann, Srushti Gangireddy, Nitin B Jain, Ayush Giri","doi":"10.1055/a-2524-5216","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The use of Electronic Health Records (EHRs) in research demands robust, interoperable systems. By linking biorepositories to EHR algorithms, researchers can efficiently identify cases and controls for large observational studies (e.g., Genome-Wide Association Studies (GWAS)). This is critical for ensuring efficient and cost-effective research. However, the lack of standardized metadata and algorithms across different EHRs complicates their sharing and application. Our study presents an example of a successful implementation and validation process.</p><p><strong>Objective: </strong>To implement and validate a rule-based algorithm from a tertiary medical center in Tennessee to classify cases and controls from a research study on rotator cuff tear nested within a tertiary medical center in North Texas and to assess the algorithm's performance.</p><p><strong>Methods: </strong>We applied a phenotypic algorithm (designed and validated in a tertiary medical center in Tennessee) using EHR data from 492 patients enrolled in case-control study recruited from a tertiary medical center in North Texas. The algorithm leveraged ICD (International Classification of Diseases) and CPT (Current Procedural Terminology) codes to identify case and control status for degenerative rotator cuff tears. A manual review was conducted to compare the algorithm's classification with a previously recorded gold standard documented by clinical researchers.</p><p><strong>Results: </strong>Initially the algorithm identified 398 (80.9%) patients correctly as cases or controls. After fine-tunning and corrections of errors in our gold standard dataset, we calculated a sensitivity of 0.94 and specificity of 0.76.</p><p><strong>Discussion: </strong>The implementation of the algorithm presented challenges due to the variability in coding practices between medical centers. To enhance performance, we refined the algorithm's data dictionary by incorporating additional codes. The process highlighted the need for meticulous code verification and standardization in multi-center studies.</p><p><strong>Conclusion: </strong>Sharing case-control algorithms boosts EHR research. Our rule-based algorithm improved multi-site patient identification and revealed 12 data entry errors, helping validate our results.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Clinical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2524-5216","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The use of Electronic Health Records (EHRs) in research demands robust, interoperable systems. By linking biorepositories to EHR algorithms, researchers can efficiently identify cases and controls for large observational studies (e.g., Genome-Wide Association Studies (GWAS)). This is critical for ensuring efficient and cost-effective research. However, the lack of standardized metadata and algorithms across different EHRs complicates their sharing and application. Our study presents an example of a successful implementation and validation process.

Objective: To implement and validate a rule-based algorithm from a tertiary medical center in Tennessee to classify cases and controls from a research study on rotator cuff tear nested within a tertiary medical center in North Texas and to assess the algorithm's performance.

Methods: We applied a phenotypic algorithm (designed and validated in a tertiary medical center in Tennessee) using EHR data from 492 patients enrolled in case-control study recruited from a tertiary medical center in North Texas. The algorithm leveraged ICD (International Classification of Diseases) and CPT (Current Procedural Terminology) codes to identify case and control status for degenerative rotator cuff tears. A manual review was conducted to compare the algorithm's classification with a previously recorded gold standard documented by clinical researchers.

Results: Initially the algorithm identified 398 (80.9%) patients correctly as cases or controls. After fine-tunning and corrections of errors in our gold standard dataset, we calculated a sensitivity of 0.94 and specificity of 0.76.

Discussion: The implementation of the algorithm presented challenges due to the variability in coding practices between medical centers. To enhance performance, we refined the algorithm's data dictionary by incorporating additional codes. The process highlighted the need for meticulous code verification and standardization in multi-center studies.

Conclusion: Sharing case-control algorithms boosts EHR research. Our rule-based algorithm improved multi-site patient identification and revealed 12 data entry errors, helping validate our results.

背景:在研究中使用电子健康记录(EHR)需要强大、可互操作的系统。通过将生物库与电子病历算法连接起来,研究人员可以有效地确定大型观察性研究(如全基因组关联研究(GWAS))的病例和对照。这对于确保研究的效率和成本效益至关重要。然而,不同电子病历之间缺乏标准化的元数据和算法,这使得它们的共享和应用变得更加复杂。我们的研究提供了一个成功实施和验证过程的实例:实施并验证田纳西州一家三级医疗中心的基于规则的算法,对北德克萨斯州一家三级医疗中心的肩袖撕裂研究中的病例和对照进行分类,并评估该算法的性能:我们利用从北德克萨斯州一家三级医疗中心招募的 492 名病例对照研究入组患者的电子病历数据,应用了一种表型算法(在田纳西州一家三级医疗中心设计并验证)。该算法利用 ICD(国际疾病分类)和 CPT(现行程序术语)代码来识别退行性肩袖撕裂的病例和对照状态。为了将该算法的分类与临床研究人员之前记录的黄金标准进行比较,还进行了人工审核:结果:最初,该算法将 398 名(80.9%)患者正确识别为病例或对照组。在对金标准数据集进行微调和纠错后,我们计算出灵敏度为 0.94,特异度为 0.76:由于不同医疗中心的编码实践存在差异,该算法的实施面临挑战。为了提高算法的性能,我们改进了算法的数据字典,加入了更多的代码。这一过程凸显了在多中心研究中进行细致编码验证和标准化的必要性:结论:共享病例对照算法可促进电子病历研究。我们基于规则的算法改进了多中心患者的识别,并发现了 12 个数据录入错误,有助于验证我们的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Clinical Informatics
Applied Clinical Informatics MEDICAL INFORMATICS-
CiteScore
4.60
自引率
24.10%
发文量
132
期刊介绍: ACI is the third Schattauer journal dealing with biomedical and health informatics. It perfectly complements our other journals Öffnet internen Link im aktuellen FensterMethods of Information in Medicine and the Öffnet internen Link im aktuellen FensterYearbook of Medical Informatics. The Yearbook of Medical Informatics being the “Milestone” or state-of-the-art journal and Methods of Information in Medicine being the “Science and Research” journal of IMIA, ACI intends to be the “Practical” journal of IMIA.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信