Weighing the benefits and risks of collecting race and ethnicity data in clinical settings for medical artificial intelligence

IF 24.1 1区医学 Q1 MEDICAL INFORMATICS

Lancet Digital Health Pub Date : 2025-03-25 DOI:10.1016/j.landig.2025.01.003

Amelia Fiske PhD , Sarah Blacker PhD , Lester Darryl Geneviève PhD , Theresa Willem MA , Marie-Christine Fritzsche , Alena Buyx MD , Leo Anthony Celi MD , Stuart McLennan PhD

{"title":"Weighing the benefits and risks of collecting race and ethnicity data in clinical settings for medical artificial intelligence","authors":"Amelia Fiske PhD , Sarah Blacker PhD , Lester Darryl Geneviève PhD , Theresa Willem MA , Marie-Christine Fritzsche , Alena Buyx MD , Leo Anthony Celi MD , Stuart McLennan PhD","doi":"10.1016/j.landig.2025.01.003","DOIUrl":null,"url":null,"abstract":"<div><div>Many countries around the world do not collect race and ethnicity data in clinical settings. Without such identified data, it is difficult to identify biases in the training data or output of a given artificial intelligence (AI) algorithm, and to work towards medical AI tools that do not exclude or further harm marginalised groups. However, the collection of these data also poses specific risks to racially minoritised populations and other marginalised groups. This Viewpoint weighs the risks of collecting race and ethnicity data in clinical settings against the risks of not collecting those data. The collection of more comprehensive identified data (ie, data that include personal attributes such as race, ethnicity, and sex) has the possibility to benefit racially minoritised populations that have historically faced worse health outcomes and health-care access, and inadequate representation in research. However, the collection of extensive demographic data raises important concerns that include the construction of intersectional social categories (ie, race and its shifting meaning in different sociopolitical contexts), the risks of biological reductionism, and the potential for misuse, particularly in situations of historical exclusion, violence, conflict, genocide, and colonialism. Careful navigation of identified data collection is key to building better AI algorithms and to work towards medicine that does not exclude or harm marginalised groups.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 4","pages":"Pages e286-e294"},"PeriodicalIF":24.1000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750025000032","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Many countries around the world do not collect race and ethnicity data in clinical settings. Without such identified data, it is difficult to identify biases in the training data or output of a given artificial intelligence (AI) algorithm, and to work towards medical AI tools that do not exclude or further harm marginalised groups. However, the collection of these data also poses specific risks to racially minoritised populations and other marginalised groups. This Viewpoint weighs the risks of collecting race and ethnicity data in clinical settings against the risks of not collecting those data. The collection of more comprehensive identified data (ie, data that include personal attributes such as race, ethnicity, and sex) has the possibility to benefit racially minoritised populations that have historically faced worse health outcomes and health-care access, and inadequate representation in research. However, the collection of extensive demographic data raises important concerns that include the construction of intersectional social categories (ie, race and its shifting meaning in different sociopolitical contexts), the risks of biological reductionism, and the potential for misuse, particularly in situations of historical exclusion, violence, conflict, genocide, and colonialism. Careful navigation of identified data collection is key to building better AI algorithms and to work towards medicine that does not exclude or harm marginalised groups.

查看原文本刊更多论文

权衡在临床环境中为医疗人工智能收集种族和民族数据的利弊

世界上许多国家在临床环境中不收集种族和民族数据。如果没有这些已确定的数据，就很难在训练数据或给定人工智能算法的输出中发现偏见，并努力开发不排斥或进一步伤害边缘化群体的医疗人工智能工具。然而，这些数据的收集也给少数族裔和其他边缘群体带来了特定的风险。这一观点权衡了在临床环境中收集种族和民族数据的风险与不收集这些数据的风险。收集更全面的已确定数据（即包括种族、族裔和性别等个人属性的数据）有可能使少数族裔人口受益，这些人口在历史上面临着较差的健康结果和获得医疗保健的机会，并且在研究中代表性不足。然而，大量人口统计数据的收集引起了重要的关注，包括交叉社会类别的构建（即种族及其在不同社会政治背景下的意义变化），生物还原论的风险，以及滥用的可能性，特别是在历史排斥，暴力，冲突，种族灭绝和殖民主义的情况下。对已识别的数据收集进行谨慎导航，是构建更好的人工智能算法和致力于不排斥或伤害边缘群体的医学的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Lancet Digital Health Multiple-

CiteScore

41.20

自引率

1.60%

发文量

232

审稿时长

13 weeks

期刊介绍： The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health. The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health. We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.