利用图表回顾表型中的未确定病例来加强基于ehr的关联研究

IF 4.5 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics Pub Date : 2025-04-30 DOI:10.1016/j.jbi.2025.104839

Xinyao Jian , Dazheng Zhang , Zehao Yu , Hua Xu , Jiang Bian , Yonghui Wu , Jiayi Tong , Yong Chen

{"title":"利用图表回顾表型中的未确定病例来加强基于ehr的关联研究","authors":"Xinyao Jian , Dazheng Zhang , Zehao Yu , Hua Xu , Jiang Bian , Yonghui Wu , Jiayi Tong , Yong Chen","doi":"10.1016/j.jbi.2025.104839","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. The gold standard, manual chart review, involves clinicians determining the true disease status based on their assessment of health records. These clinicians-labeled phenotypes are labor-intensive and typically limited to a small subset of patients, potentially introducing a third “undecided” category when phenotypes are indeterminate. We aim to effectively integrate the algorithm-derived and chart-reviewed outcomes when both are available in EHR-based association studies.</div></div><div><h3>Material and Methods</h3><div>We propose an augmented estimation method that combines the binary algorithm-derived phenotypes for the entire cohort with the trinary chart-reviewed phenotypes for a small, selected subset. Additionally, a cost-effective outcome-dependent sampling strategy is used to address the rare disease scenarios. The proposed trinary chart-reviewed phenotype integrated cost-effective augmented estimation (TriCA) was evaluated across a wide range of simulation settings and real-world applications, including using EHR data on Alzheimer’s disease and related dementias (ADRD) from the OneFlorida + Clinical Research Network, and using cohort data on second breast cancer events (SBCE) from the Kaiser Permanente Washington.</div></div><div><h3>Results</h3><div>Compared to estimation based on random sampling, our augmented method improved mean square error by up to 28.3% in simulation studies; compared to estimation using only trinary chart-reviewed phenotypes, our method improved efficiency by up to 33.3% in ADRD data and 50.8% in SBCE data.</div></div><div><h3>Discussion</h3><div>Our simulation studies and real-world applications demonstrate that, compared to existing methods, the proposed method provides unbiased estimates with higher statistical efficiency.</div></div><div><h3>Conclusion</h3><div>The proposed method effectively combined binary algorithm-derived phenotypes for the whole cohort with trinary chart-reviewed outcomes for a limited validation set, making it applicable to a broader range of applications and enhancing risk factor identification in EHR-based association studies.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104839"},"PeriodicalIF":4.5000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies\",\"authors\":\"Xinyao Jian , Dazheng Zhang , Zehao Yu , Hua Xu , Jiang Bian , Yonghui Wu , Jiayi Tong , Yong Chen\",\"doi\":\"10.1016/j.jbi.2025.104839\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objectives</h3><div>In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. The gold standard, manual chart review, involves clinicians determining the true disease status based on their assessment of health records. These clinicians-labeled phenotypes are labor-intensive and typically limited to a small subset of patients, potentially introducing a third “undecided” category when phenotypes are indeterminate. We aim to effectively integrate the algorithm-derived and chart-reviewed outcomes when both are available in EHR-based association studies.</div></div><div><h3>Material and Methods</h3><div>We propose an augmented estimation method that combines the binary algorithm-derived phenotypes for the entire cohort with the trinary chart-reviewed phenotypes for a small, selected subset. Additionally, a cost-effective outcome-dependent sampling strategy is used to address the rare disease scenarios. The proposed trinary chart-reviewed phenotype integrated cost-effective augmented estimation (TriCA) was evaluated across a wide range of simulation settings and real-world applications, including using EHR data on Alzheimer’s disease and related dementias (ADRD) from the OneFlorida + Clinical Research Network, and using cohort data on second breast cancer events (SBCE) from the Kaiser Permanente Washington.</div></div><div><h3>Results</h3><div>Compared to estimation based on random sampling, our augmented method improved mean square error by up to 28.3% in simulation studies; compared to estimation using only trinary chart-reviewed phenotypes, our method improved efficiency by up to 33.3% in ADRD data and 50.8% in SBCE data.</div></div><div><h3>Discussion</h3><div>Our simulation studies and real-world applications demonstrate that, compared to existing methods, the proposed method provides unbiased estimates with higher statistical efficiency.</div></div><div><h3>Conclusion</h3><div>The proposed method effectively combined binary algorithm-derived phenotypes for the whole cohort with trinary chart-reviewed outcomes for a limited validation set, making it applicable to a broader range of applications and enhancing risk factor identification in EHR-based association studies.</div></div>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\"166 \",\"pages\":\"Article 104839\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1532046425000681\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000681","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

目的在基于电子健康记录（EHR）的关联研究中，表型算法有效地将患者临床结果分为二元分类，但容易出现分类错误。黄金标准是手工图表审查，临床医生根据他们对健康记录的评估来确定真正的疾病状态。这些临床医生标记的表型是劳动密集型的，通常仅限于一小部分患者，当表型不确定时，可能会引入第三种“未确定”类别。我们的目标是在基于电子病历的关联研究中有效地整合算法推导和图表评审的结果。材料和方法我们提出了一种增强估计方法，该方法将整个队列的二元算法衍生表型与一小部分选定子集的三叉图审查表型相结合。此外，一个具有成本效益的结果依赖的抽样策略被用于解决罕见疾病的情况。提出的三图回顾型综合成本效益增强估计（TriCA）在广泛的模拟设置和现实世界应用中进行评估，包括使用来自OneFlorida +临床研究网络的阿尔茨海默病和相关痴呆（ADRD）的电子病历数据，以及来自Kaiser Permanente Washington的第二次乳腺癌事件（SBCE）的队列数据。结果与基于随机抽样的估计相比，我们的增广方法在仿真研究中使均方误差提高了28.3%；与仅使用三联图评估表型的估计相比，我们的方法在ADRD数据中提高了33.3%，在SBCE数据中提高了50.8%。我们的模拟研究和实际应用表明，与现有方法相比，所提出的方法提供了具有更高统计效率的无偏估计。该方法有效地将整个队列的二元算法衍生表型与有限验证集的三元图表结果相结合，使其适用于更广泛的应用，并增强了基于ehr关联研究的风险因素识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies

查看原文本刊更多论文

Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies

Objectives

In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. The gold standard, manual chart review, involves clinicians determining the true disease status based on their assessment of health records. These clinicians-labeled phenotypes are labor-intensive and typically limited to a small subset of patients, potentially introducing a third “undecided” category when phenotypes are indeterminate. We aim to effectively integrate the algorithm-derived and chart-reviewed outcomes when both are available in EHR-based association studies.

Material and Methods

We propose an augmented estimation method that combines the binary algorithm-derived phenotypes for the entire cohort with the trinary chart-reviewed phenotypes for a small, selected subset. Additionally, a cost-effective outcome-dependent sampling strategy is used to address the rare disease scenarios. The proposed trinary chart-reviewed phenotype integrated cost-effective augmented estimation (TriCA) was evaluated across a wide range of simulation settings and real-world applications, including using EHR data on Alzheimer’s disease and related dementias (ADRD) from the OneFlorida + Clinical Research Network, and using cohort data on second breast cancer events (SBCE) from the Kaiser Permanente Washington.

Results

Compared to estimation based on random sampling, our augmented method improved mean square error by up to 28.3% in simulation studies; compared to estimation using only trinary chart-reviewed phenotypes, our method improved efficiency by up to 33.3% in ADRD data and 50.8% in SBCE data.

Discussion

Our simulation studies and real-world applications demonstrate that, compared to existing methods, the proposed method provides unbiased estimates with higher statistical efficiency.

Conclusion

The proposed method effectively combined binary algorithm-derived phenotypes for the whole cohort with trinary chart-reviewed outcomes for a limited validation set, making it applicable to a broader range of applications and enhancing risk factor identification in EHR-based association studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.