Evaluating the Performance and Potential Bias of Predictive Models for Detection of Transthyretin Cardiac Amyloidosis

JACC advances Pub Date : 2025-07-05 DOI:10.1016/j.jacadv.2025.101901

Jonathan Hourmozdi MD, MSAI, MA , Nicholas Easton MSAI , Simon Benigeri MSAI , James D. Thomas MD , Akhil Narang MD , David Ouyang MD , Grant Duffy BS , Ross Upton PhD , Will Hawkes PhD , Ashley Akerman PhD , Ike Okwuosa MD , Adrienne Kline MD, PhD , Abel N. Kho MD , Yuan Luo PhD , Sanjiv J. Shah MD , Faraz S. Ahmad MD, MS

{"title":"Evaluating the Performance and Potential Bias of Predictive Models for Detection of Transthyretin Cardiac Amyloidosis","authors":"Jonathan Hourmozdi MD, MSAI, MA , Nicholas Easton MSAI , Simon Benigeri MSAI , James D. Thomas MD , Akhil Narang MD , David Ouyang MD , Grant Duffy BS , Ross Upton PhD , Will Hawkes PhD , Ashley Akerman PhD , Ike Okwuosa MD , Adrienne Kline MD, PhD , Abel N. Kho MD , Yuan Luo PhD , Sanjiv J. Shah MD , Faraz S. Ahmad MD, MS","doi":"10.1016/j.jacadv.2025.101901","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Delays in the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) contribute to the significant morbidity of the condition, especially in the era of disease-modifying therapies. Screening for ATTR-CM with artificial intelligence and other algorithms may improve timely diagnosis, but these algorithms have not been directly compared.</div></div><div><h3>Objectives</h3><div>The aim of this study was to compare the performance of 4 algorithms for ATTR-CM detection in a heart failure population and assess the risk for harms due to model bias.</div></div><div><h3>Methods</h3><div>We identified patients in an integrated health system from 2010 to 2022 with ATTR-CM and age- and sex-matched them to controls with heart failure to target 5% prevalence. We compared the performance of a claims-based random forest model (Huda et al model), a regression-based score (Mayo ATTR-CM), and 2 deep learning echo models (EchoNet-LVH and EchoGo Amyloidosis). We evaluated for bias using standard fairness metrics.</div></div><div><h3>Results</h3><div>The analytical cohort included 176 confirmed cases of ATTR-CM and 3,192 control patients with 79.2% self-identified as White and 9.0% as Black. The Huda et al model performed poorly (AUC: 0.49). Both deep learning echo models had a higher AUC when compared to the Mayo ATTR-CM Score (EchoNet-LVH 0.88; EchoGo Amyloidosis 0.92; Mayo ATTR-CM Score 0.79; DeLong <em>P</em> < 0.001 for both). Bias auditing met fairness criteria for <em>equal opportunity</em> among patients who identified as Black.</div></div><div><h3>Conclusions</h3><div>Deep learning, echo-based models to detect ATTR-CM demonstrated best overall discrimination when compared to 2 other models in external validation with low risk of harms due to racial bias.</div></div>","PeriodicalId":73527,"journal":{"name":"JACC advances","volume":"4 8","pages":"Article 101901"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JACC advances","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772963X25003217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Delays in the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) contribute to the significant morbidity of the condition, especially in the era of disease-modifying therapies. Screening for ATTR-CM with artificial intelligence and other algorithms may improve timely diagnosis, but these algorithms have not been directly compared.

Objectives

The aim of this study was to compare the performance of 4 algorithms for ATTR-CM detection in a heart failure population and assess the risk for harms due to model bias.

Methods

We identified patients in an integrated health system from 2010 to 2022 with ATTR-CM and age- and sex-matched them to controls with heart failure to target 5% prevalence. We compared the performance of a claims-based random forest model (Huda et al model), a regression-based score (Mayo ATTR-CM), and 2 deep learning echo models (EchoNet-LVH and EchoGo Amyloidosis). We evaluated for bias using standard fairness metrics.

Results

The analytical cohort included 176 confirmed cases of ATTR-CM and 3,192 control patients with 79.2% self-identified as White and 9.0% as Black. The Huda et al model performed poorly (AUC: 0.49). Both deep learning echo models had a higher AUC when compared to the Mayo ATTR-CM Score (EchoNet-LVH 0.88; EchoGo Amyloidosis 0.92; Mayo ATTR-CM Score 0.79; DeLong P < 0.001 for both). Bias auditing met fairness criteria for equal opportunity among patients who identified as Black.

Conclusions

Deep learning, echo-based models to detect ATTR-CM demonstrated best overall discrimination when compared to 2 other models in external validation with low risk of harms due to racial bias.

查看原文本刊更多论文

评估甲状腺素型心脏淀粉样变性检测预测模型的性能和潜在偏差

背景转甲状腺素淀粉样心肌病（atr - cm）的诊断延迟导致该疾病的显著发病率，特别是在疾病改善治疗的时代。用人工智能和其他算法筛查atr - cm可能会提高诊断的及时性，但这些算法尚未进行直接比较。本研究的目的是比较4种算法在心力衰竭人群中检测atr - cm的性能，并评估模型偏差造成的危害风险。方法：我们在2010年至2022年的综合卫生系统中确定了患有atr - cm的患者，并将其年龄和性别与心力衰竭对照进行匹配，目标患病率为5%。我们比较了基于索赔的随机森林模型（Huda等模型）、基于回归的评分（Mayo atr - cm）和2种深度学习回声模型（EchoNet-LVH和EchoGo淀粉样变性）的性能。我们使用标准公平指标评估偏倚。结果分析队列包括176例确诊的atr - cm患者和3192例对照患者，其中79.2%为白人，9.0%为黑人。Huda等人的模型表现不佳（AUC: 0.49）。与Mayo atr - cm评分相比，两种深度学习回声模型的AUC均较高(EchoNet-LVH 0.88；EchoGo淀粉样变性0.92；Mayo atr - cm评分0.79；德隆<；两者均为0.001)。偏见审计符合黑人患者机会均等的公平标准。结论与其他两种模型相比，深度学习、基于回声的atr - cm检测模型在外部验证中表现出最佳的整体歧视，且种族偏见危害风险较低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JACC advances Cardiology and Cardiovascular Medicine

CiteScore

1.90

自引率

0.00%

发文量