{"title":"针对特定亚群的合成电子健康记录可提高死亡率预测性能。","authors":"Oriel Perets, Nadav Rappoport","doi":"10.1093/jamiaopen/ooaf091","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To address biased representation in Electronic Health Records (EHRs) across subpopulations (SPs), which leads to predictive models underperforming for underrepresented groups, we propose a framework to enhance equitable predictive performance.</p><p><strong>Materials and methods: </strong>We developed a framework using generative adversarial networks (GANs) to create SP-specific synthetic data, which augments the original training datasets. Subsequently, we employed an ensemble approach, training distinct prediction models tailored to each SP.</p><p><strong>Results: </strong>The proposed framework was evaluated on two datasets derived from the MIMIC database, achieving a performance improvement in Receiver Operating Characteristics Area Under Curve (ROCAUC) ranging from 8% to 31% for underrepresented SPs.</p><p><strong>Discussion: </strong>The results indicate that targeted synthetic data augmentation and SP-specific model training significantly mitigate the performance disparities observed in conventional predictive models trained on imbalanced EHR data.</p><p><strong>Conclusion: </strong>Our novel GAN-based framework, combined with an ensemble prediction approach, effectively enhances predictive equity across SPs. The code and ensemble models developed in this study are publicly available, supporting further research and practical adoption of equitable predictive analytics in healthcare.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 4","pages":"ooaf091"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342355/pdf/","citationCount":"0","resultStr":"{\"title\":\"Subpopulation-specific synthetic electronic health records can increase mortality prediction performance.\",\"authors\":\"Oriel Perets, Nadav Rappoport\",\"doi\":\"10.1093/jamiaopen/ooaf091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To address biased representation in Electronic Health Records (EHRs) across subpopulations (SPs), which leads to predictive models underperforming for underrepresented groups, we propose a framework to enhance equitable predictive performance.</p><p><strong>Materials and methods: </strong>We developed a framework using generative adversarial networks (GANs) to create SP-specific synthetic data, which augments the original training datasets. Subsequently, we employed an ensemble approach, training distinct prediction models tailored to each SP.</p><p><strong>Results: </strong>The proposed framework was evaluated on two datasets derived from the MIMIC database, achieving a performance improvement in Receiver Operating Characteristics Area Under Curve (ROCAUC) ranging from 8% to 31% for underrepresented SPs.</p><p><strong>Discussion: </strong>The results indicate that targeted synthetic data augmentation and SP-specific model training significantly mitigate the performance disparities observed in conventional predictive models trained on imbalanced EHR data.</p><p><strong>Conclusion: </strong>Our novel GAN-based framework, combined with an ensemble prediction approach, effectively enhances predictive equity across SPs. The code and ensemble models developed in this study are publicly available, supporting further research and practical adoption of equitable predictive analytics in healthcare.</p>\",\"PeriodicalId\":36278,\"journal\":{\"name\":\"JAMIA Open\",\"volume\":\"8 4\",\"pages\":\"ooaf091\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342355/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JAMIA Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jamiaopen/ooaf091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Subpopulation-specific synthetic electronic health records can increase mortality prediction performance.
Objective: To address biased representation in Electronic Health Records (EHRs) across subpopulations (SPs), which leads to predictive models underperforming for underrepresented groups, we propose a framework to enhance equitable predictive performance.
Materials and methods: We developed a framework using generative adversarial networks (GANs) to create SP-specific synthetic data, which augments the original training datasets. Subsequently, we employed an ensemble approach, training distinct prediction models tailored to each SP.
Results: The proposed framework was evaluated on two datasets derived from the MIMIC database, achieving a performance improvement in Receiver Operating Characteristics Area Under Curve (ROCAUC) ranging from 8% to 31% for underrepresented SPs.
Discussion: The results indicate that targeted synthetic data augmentation and SP-specific model training significantly mitigate the performance disparities observed in conventional predictive models trained on imbalanced EHR data.
Conclusion: Our novel GAN-based framework, combined with an ensemble prediction approach, effectively enhances predictive equity across SPs. The code and ensemble models developed in this study are publicly available, supporting further research and practical adoption of equitable predictive analytics in healthcare.