用机器学习改进不同人群中心肌梗死的分类。

IF 4.8 2区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Alicia Chen, Chuan Hong, Yuk Lam Ho, Nicholas Link, Jacqueline P Honerlaw, Vidisha Tanukonda, Ariela R Orkaby, Saadia Qazi, Connor Melley, Ashley Galloway, Lauren Costa, Monika Maripuri, Xuan Wang, Yichi Zhang, Petra Schubert, Tianrun Cai, Zeling He, Vidul A Panickan, Morgan Rosser, Laura Tarko, Sharon Dowell, Candace Feldman, Gail Kerr, J Michael Gaziano, Peter W F Wilson, Kelly Cho, Tianxi Cai, Katherine P Liao
{"title":"用机器学习改进不同人群中心肌梗死的分类。","authors":"Alicia Chen, Chuan Hong, Yuk Lam Ho, Nicholas Link, Jacqueline P Honerlaw, Vidisha Tanukonda, Ariela R Orkaby, Saadia Qazi, Connor Melley, Ashley Galloway, Lauren Costa, Monika Maripuri, Xuan Wang, Yichi Zhang, Petra Schubert, Tianrun Cai, Zeling He, Vidul A Panickan, Morgan Rosser, Laura Tarko, Sharon Dowell, Candace Feldman, Gail Kerr, J Michael Gaziano, Peter W F Wilson, Kelly Cho, Tianxi Cai, Katherine P Liao","doi":"10.1093/aje/kwaf223","DOIUrl":null,"url":null,"abstract":"<p><p>Phenotype classification with electronic health record (EHR) data is increasingly performed with ML, however their performance in diverse populations remains understudied. We compared an ICD-based algorithm with an ML phenotyping pipeline to classify myocardial infarction (MI) in a general and self-reported Black population. We determined the impact of differential performance by replicating a published MI risk factor study with MI defined by the ICD or ML algorithms. Individuals followed in the Veterans Health Administration (VHA) EHR with data from 2002 to 2019 were examined: 11,523,175 Veterans, mean age 67.5 years, 93.8% male, 14.3% Black, 79.1% White. MI was classified using a published rule-based ICD algorithm and an ML pipeline, PheCAP which incorporates natural language processing. Algorithms were trained and validated against n=403 Veterans randomly selected and chart-reviewed for MI (gold standard), oversampled for self-reported Black. Among chart-reviewed Veterans, the ICD algorithm had high PPV and low sensitivity (all race, PPV:0.97, sensitivity:0.17; Black Veterans, PPV:0.94, sensitivity:0.24). PheCAP MI had good PPV and higher sensitivity (all race, PPV:0.90, sensitivity:0.66; Black, PPV:0.81, sensitivity:0.79). Applying PheCAP MI to the entire VHA population to classify MI provided increased power to replicate findings from the published MI risk factor study compared to the ICD algorithm.</p>","PeriodicalId":7472,"journal":{"name":"American journal of epidemiology","volume":" ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving classification of myocardial infarction with machine learning in a diverse population.\",\"authors\":\"Alicia Chen, Chuan Hong, Yuk Lam Ho, Nicholas Link, Jacqueline P Honerlaw, Vidisha Tanukonda, Ariela R Orkaby, Saadia Qazi, Connor Melley, Ashley Galloway, Lauren Costa, Monika Maripuri, Xuan Wang, Yichi Zhang, Petra Schubert, Tianrun Cai, Zeling He, Vidul A Panickan, Morgan Rosser, Laura Tarko, Sharon Dowell, Candace Feldman, Gail Kerr, J Michael Gaziano, Peter W F Wilson, Kelly Cho, Tianxi Cai, Katherine P Liao\",\"doi\":\"10.1093/aje/kwaf223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Phenotype classification with electronic health record (EHR) data is increasingly performed with ML, however their performance in diverse populations remains understudied. We compared an ICD-based algorithm with an ML phenotyping pipeline to classify myocardial infarction (MI) in a general and self-reported Black population. We determined the impact of differential performance by replicating a published MI risk factor study with MI defined by the ICD or ML algorithms. Individuals followed in the Veterans Health Administration (VHA) EHR with data from 2002 to 2019 were examined: 11,523,175 Veterans, mean age 67.5 years, 93.8% male, 14.3% Black, 79.1% White. MI was classified using a published rule-based ICD algorithm and an ML pipeline, PheCAP which incorporates natural language processing. Algorithms were trained and validated against n=403 Veterans randomly selected and chart-reviewed for MI (gold standard), oversampled for self-reported Black. Among chart-reviewed Veterans, the ICD algorithm had high PPV and low sensitivity (all race, PPV:0.97, sensitivity:0.17; Black Veterans, PPV:0.94, sensitivity:0.24). PheCAP MI had good PPV and higher sensitivity (all race, PPV:0.90, sensitivity:0.66; Black, PPV:0.81, sensitivity:0.79). Applying PheCAP MI to the entire VHA population to classify MI provided increased power to replicate findings from the published MI risk factor study compared to the ICD algorithm.</p>\",\"PeriodicalId\":7472,\"journal\":{\"name\":\"American journal of epidemiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/aje/kwaf223\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/aje/kwaf223","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

电子健康记录(EHR)数据的表型分类越来越多地与ML一起进行,但是它们在不同人群中的表现仍未得到充分研究。我们将基于icd的算法与ML表型管道进行比较,以对一般和自我报告的黑人人群中的心肌梗死(MI)进行分类。我们通过复制一项由ICD或ML算法定义的心肌梗死风险因素研究来确定差异表现的影响。对2002年至2019年退伍军人健康管理局(VHA)电子病历中随访的个人进行了检查:11523175名退伍军人,平均年龄67.5岁,93.8%为男性,14.3%为黑人,79.1%为白人。MI使用已发布的基于规则的ICD算法和包含自然语言处理的ML管道PheCAP进行分类。算法针对n=403名随机选择的退伍军人进行训练和验证,并对MI(金标准)进行图表审查,对自我报告的Black进行过采样。在经图表评审的退伍军人中,ICD算法具有较高的PPV和较低的灵敏度(所有种族的PPV为0.97,灵敏度为0.17;黑人退伍军人的PPV为0.94,灵敏度为0.24)。PheCAP MI具有良好的PPV和较高的灵敏度(所有人种,PPV:0.90,灵敏度:0.66;黑人,PPV:0.81,灵敏度:0.79)。与ICD算法相比,将PheCAP MI应用于整个VHA人群来对心肌梗死进行分类,可以提高复制已发表的心肌梗死风险因素研究结果的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving classification of myocardial infarction with machine learning in a diverse population.

Phenotype classification with electronic health record (EHR) data is increasingly performed with ML, however their performance in diverse populations remains understudied. We compared an ICD-based algorithm with an ML phenotyping pipeline to classify myocardial infarction (MI) in a general and self-reported Black population. We determined the impact of differential performance by replicating a published MI risk factor study with MI defined by the ICD or ML algorithms. Individuals followed in the Veterans Health Administration (VHA) EHR with data from 2002 to 2019 were examined: 11,523,175 Veterans, mean age 67.5 years, 93.8% male, 14.3% Black, 79.1% White. MI was classified using a published rule-based ICD algorithm and an ML pipeline, PheCAP which incorporates natural language processing. Algorithms were trained and validated against n=403 Veterans randomly selected and chart-reviewed for MI (gold standard), oversampled for self-reported Black. Among chart-reviewed Veterans, the ICD algorithm had high PPV and low sensitivity (all race, PPV:0.97, sensitivity:0.17; Black Veterans, PPV:0.94, sensitivity:0.24). PheCAP MI had good PPV and higher sensitivity (all race, PPV:0.90, sensitivity:0.66; Black, PPV:0.81, sensitivity:0.79). Applying PheCAP MI to the entire VHA population to classify MI provided increased power to replicate findings from the published MI risk factor study compared to the ICD algorithm.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
American journal of epidemiology
American journal of epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
7.40
自引率
4.00%
发文量
221
审稿时长
3-6 weeks
期刊介绍: The American Journal of Epidemiology is the oldest and one of the premier epidemiologic journals devoted to the publication of empirical research findings, opinion pieces, and methodological developments in the field of epidemiologic research. It is a peer-reviewed journal aimed at both fellow epidemiologists and those who use epidemiologic data, including public health workers and clinicians.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信