Subpopulation-specific synthetic electronic health records can increase mortality prediction performance.

IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES
JAMIA Open Pub Date : 2025-08-07 eCollection Date: 2025-08-01 DOI:10.1093/jamiaopen/ooaf091
Oriel Perets, Nadav Rappoport
{"title":"Subpopulation-specific synthetic electronic health records can increase mortality prediction performance.","authors":"Oriel Perets, Nadav Rappoport","doi":"10.1093/jamiaopen/ooaf091","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To address biased representation in Electronic Health Records (EHRs) across subpopulations (SPs), which leads to predictive models underperforming for underrepresented groups, we propose a framework to enhance equitable predictive performance.</p><p><strong>Materials and methods: </strong>We developed a framework using generative adversarial networks (GANs) to create SP-specific synthetic data, which augments the original training datasets. Subsequently, we employed an ensemble approach, training distinct prediction models tailored to each SP.</p><p><strong>Results: </strong>The proposed framework was evaluated on two datasets derived from the MIMIC database, achieving a performance improvement in Receiver Operating Characteristics Area Under Curve (ROCAUC) ranging from 8% to 31% for underrepresented SPs.</p><p><strong>Discussion: </strong>The results indicate that targeted synthetic data augmentation and SP-specific model training significantly mitigate the performance disparities observed in conventional predictive models trained on imbalanced EHR data.</p><p><strong>Conclusion: </strong>Our novel GAN-based framework, combined with an ensemble prediction approach, effectively enhances predictive equity across SPs. The code and ensemble models developed in this study are publicly available, supporting further research and practical adoption of equitable predictive analytics in healthcare.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf091"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342355/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To address biased representation in Electronic Health Records (EHRs) across subpopulations (SPs), which leads to predictive models underperforming for underrepresented groups, we propose a framework to enhance equitable predictive performance.

Materials and methods: We developed a framework using generative adversarial networks (GANs) to create SP-specific synthetic data, which augments the original training datasets. Subsequently, we employed an ensemble approach, training distinct prediction models tailored to each SP.

Results: The proposed framework was evaluated on two datasets derived from the MIMIC database, achieving a performance improvement in Receiver Operating Characteristics Area Under Curve (ROCAUC) ranging from 8% to 31% for underrepresented SPs.

Discussion: The results indicate that targeted synthetic data augmentation and SP-specific model training significantly mitigate the performance disparities observed in conventional predictive models trained on imbalanced EHR data.

Conclusion: Our novel GAN-based framework, combined with an ensemble prediction approach, effectively enhances predictive equity across SPs. The code and ensemble models developed in this study are publicly available, supporting further research and practical adoption of equitable predictive analytics in healthcare.

针对特定亚群的合成电子健康记录可提高死亡率预测性能。
目的:为了解决电子健康记录(EHRs)在亚人群(SPs)中的偏代表性问题,这导致预测模型在代表性不足的群体中表现不佳,我们提出了一个框架来提高公平的预测性能。材料和方法:我们开发了一个框架,使用生成对抗网络(gan)来创建特定于sp的合成数据,这增加了原始训练数据集。随后,我们采用了一种集成方法,为每个sp量身定制了不同的预测模型。结果:所提出的框架在来自MIMIC数据库的两个数据集上进行了评估,对于代表性不足的sp,接收器操作特征曲线下面积(ROCAUC)的性能提高了8%至31%。讨论:结果表明,有针对性的合成数据增强和特定sp的模型训练显著减轻了在不平衡EHR数据上训练的传统预测模型所观察到的性能差异。结论:我们新颖的基于gan的框架,结合集成预测方法,有效地增强了跨sp的预测公平性。本研究中开发的代码和集成模型是公开的,支持进一步的研究和在医疗保健中公平的预测分析的实际采用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信