Cameron Razieh, Bethan Powell, Rosemary Drummond, Isobel L Ward, Jasper Morgan, Myer Glickman, Chris White, Francesco Zaccardi, Jonathan Hope, Veena Raleigh, Ashley Akbari, Nazrul Islam, Thomas Yates, Lisa Murphy, Bilal A Mateen, Kamlesh Khunti, Vahe Nafilyan
{"title":"Understanding the quality of ethnicity data recorded in health-related administrative data sources compared with Census 2021 in England.","authors":"Cameron Razieh, Bethan Powell, Rosemary Drummond, Isobel L Ward, Jasper Morgan, Myer Glickman, Chris White, Francesco Zaccardi, Jonathan Hope, Veena Raleigh, Ashley Akbari, Nazrul Islam, Thomas Yates, Lisa Murphy, Bilal A Mateen, Kamlesh Khunti, Vahe Nafilyan","doi":"10.1371/journal.pmed.1004507","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data.</p><p><strong>Methods and findings: </strong>We linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (≥96%), followed by the Bangladeshi category (≥93%). Levels of agreement for Pakistani, Indian, and Chinese categories were ≥87%, ≥83%, and ≥80% across all sources. Agreement was lower for Mixed (≤75%) and Other (≤71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (≤6%), Other Black (≤19%), and Any Other Ethnic Group (≤25%) categories.</p><p><strong>Conclusions: </strong>Certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions.</p>","PeriodicalId":49008,"journal":{"name":"PLoS Medicine","volume":"22 2","pages":"e1004507"},"PeriodicalIF":15.8000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11864522/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1371/journal.pmed.1004507","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Electronic health records (EHRs) are increasingly used to investigate health inequalities across ethnic groups. While there are some studies showing that the recording of ethnicity in EHR is imperfect, there is no robust evidence on the accuracy between the ethnicity information recorded in various real-world sources and census data.
Methods and findings: We linked primary and secondary care NHS England data sources with Census 2021 data and compared individual-level agreement of ethnicity recording in General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR), Hospital Episode Statistics (HES), Ethnic Category Information Asset (ECIA), and Talking Therapies for anxiety and depression (TT) with ethnicity reported in the census. Census ethnicity is self-reported and, therefore, regarded as the most reliable population-level source of ethnicity recording. We further assessed the impact of multiple approaches to assigning a person an ethnic category. The number of people that could be linked to census from ECIA, GDPPR, HES, and TT were 47.4m, 43.5m, 47.8m, and 6.3m, respectively. Across all 4 data sources, the White British category had the highest level of agreement with census (≥96%), followed by the Bangladeshi category (≥93%). Levels of agreement for Pakistani, Indian, and Chinese categories were ≥87%, ≥83%, and ≥80% across all sources. Agreement was lower for Mixed (≤75%) and Other (≤71%) categories across all data sources. The categories with the lowest agreement were Gypsy or Irish Traveller (≤6%), Other Black (≤19%), and Any Other Ethnic Group (≤25%) categories.
Conclusions: Certain ethnic categories across all data sources have high discordance with census ethnic categories. These differences may lead to biased estimates of differences in health outcomes between ethnic groups, a critical data point used when making health policy and planning decisions.
期刊介绍:
PLOS Medicine is a prominent platform for discussing and researching global health challenges. The journal covers a wide range of topics, including biomedical, environmental, social, and political factors affecting health. It prioritizes articles that contribute to clinical practice, health policy, or a better understanding of pathophysiology, ultimately aiming to improve health outcomes across different settings.
The journal is unwavering in its commitment to uphold the highest ethical standards in medical publishing. This includes actively managing and disclosing any conflicts of interest related to reporting, reviewing, and publishing. PLOS Medicine promotes transparency in the entire review and publication process. The journal also encourages data sharing and encourages the reuse of published work. Additionally, authors retain copyright for their work, and the publication is made accessible through Open Access with no restrictions on availability and dissemination.
PLOS Medicine takes measures to avoid conflicts of interest associated with advertising drugs and medical devices or engaging in the exclusive sale of reprints.