{"title":"Synthetic data and health equity: accounting for racism and sexism in health care delivery.","authors":"Stephanie Teeple, Luis Emilio Muñoz, Jaya Aysola","doi":"10.1093/haschl/qxaf165","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Synthetic data are a promising new tool for answering health service research questions, including those relevant to health equity. However, it is unclear whether synthetic data can accurately capture inequities in health care, which may perpetuate racial and ethnic health inequities when applied to the real world.</p><p><strong>Methods: </strong>In this study, we determine to what extent Synthea, a popular open-source synthetic electronic health record data generator captures racial, ethnic, and sex disparities in clinical practice and evaluate whether the data can be augmented by other publicly available data sources. We examine rates of intervention for 3 common medical conditions-myocardial infarction, chronic obstructive pulmonary disease, and type II diabetes mellitus.</p><p><strong>Results: </strong>For 2 of the 3 conditions, Synthea data showed higher rates of intervention for all patients and attenuated or no disparities in intervention, vs comparator literature. After incorporating data on race, ethnicity, and sex disparities from the Dartmouth Atlas, updated Synthea proportions approached their literature counterparts in both absolute and relative terms.</p><p><strong>Conclusion: </strong>If using synthetic data, researchers and policymakers can work to ensure such data accurately reflect downstream effects of social forces in order to mitigate inadvertent harm to minoritized populations.</p>","PeriodicalId":94025,"journal":{"name":"Health affairs scholar","volume":"3 9","pages":"qxaf165"},"PeriodicalIF":2.7000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449041/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health affairs scholar","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/haschl/qxaf165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Synthetic data are a promising new tool for answering health service research questions, including those relevant to health equity. However, it is unclear whether synthetic data can accurately capture inequities in health care, which may perpetuate racial and ethnic health inequities when applied to the real world.
Methods: In this study, we determine to what extent Synthea, a popular open-source synthetic electronic health record data generator captures racial, ethnic, and sex disparities in clinical practice and evaluate whether the data can be augmented by other publicly available data sources. We examine rates of intervention for 3 common medical conditions-myocardial infarction, chronic obstructive pulmonary disease, and type II diabetes mellitus.
Results: For 2 of the 3 conditions, Synthea data showed higher rates of intervention for all patients and attenuated or no disparities in intervention, vs comparator literature. After incorporating data on race, ethnicity, and sex disparities from the Dartmouth Atlas, updated Synthea proportions approached their literature counterparts in both absolute and relative terms.
Conclusion: If using synthetic data, researchers and policymakers can work to ensure such data accurately reflect downstream effects of social forces in order to mitigate inadvertent harm to minoritized populations.