新出现的算法偏差:公平性漂移是模型维护和可持续性的下一个维度。

IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Sharon E Davis, Chad Dorn, Daniel J Park, Michael E Matheny
{"title":"新出现的算法偏差:公平性漂移是模型维护和可持续性的下一个维度。","authors":"Sharon E Davis, Chad Dorn, Daniel J Park, Michael E Matheny","doi":"10.1093/jamia/ocaf039","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>While performance drift of clinical prediction models is well-documented, the potential for algorithmic biases to emerge post-deployment has had limited characterization. A better understanding of how temporal model performance may shift across subpopulations is required to incorporate fairness drift into model maintenance strategies.</p><p><strong>Materials and methods: </strong>We explore fairness drift in a national population over 11 years, with and without model maintenance aimed at sustaining population-level performance. We trained random forest models predicting 30-day post-surgical readmission, mortality, and pneumonia using 2013 data from US Department of Veterans Affairs facilities. We evaluated performance quarterly from 2014 to 2023 by self-reported race and sex. We estimated discrimination, calibration, and accuracy, and operationalized fairness using metric parity measured as the gap between disadvantaged and advantaged groups.</p><p><strong>Results: </strong>Our cohort included 1 739 666 surgical cases. We observed fairness drift in both the original and temporally updated models. Model updating had a larger impact on overall performance than fairness gaps. During periods of stable fairness, updating models at the population level increased, decreased, or did not impact fairness gaps. During periods of fairness drift, updating models restored fairness in some cases and exacerbated fairness gaps in others.</p><p><strong>Discussion: </strong>This exploratory study highlights that algorithmic fairness cannot be assured through one-time assessments during model development. Temporal changes in fairness may take multiple forms and interact with model updating strategies in unanticipated ways.</p><p><strong>Conclusion: </strong>Equitable and sustainable clinical artificial intelligence deployments will require novel methods to monitor algorithmic fairness, detect emerging bias, and adopt model updates that promote fairness.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"845-854"},"PeriodicalIF":4.6000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012346/pdf/","citationCount":"0","resultStr":"{\"title\":\"Emerging algorithmic bias: fairness drift as the next dimension of model maintenance and sustainability.\",\"authors\":\"Sharon E Davis, Chad Dorn, Daniel J Park, Michael E Matheny\",\"doi\":\"10.1093/jamia/ocaf039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>While performance drift of clinical prediction models is well-documented, the potential for algorithmic biases to emerge post-deployment has had limited characterization. A better understanding of how temporal model performance may shift across subpopulations is required to incorporate fairness drift into model maintenance strategies.</p><p><strong>Materials and methods: </strong>We explore fairness drift in a national population over 11 years, with and without model maintenance aimed at sustaining population-level performance. We trained random forest models predicting 30-day post-surgical readmission, mortality, and pneumonia using 2013 data from US Department of Veterans Affairs facilities. We evaluated performance quarterly from 2014 to 2023 by self-reported race and sex. We estimated discrimination, calibration, and accuracy, and operationalized fairness using metric parity measured as the gap between disadvantaged and advantaged groups.</p><p><strong>Results: </strong>Our cohort included 1 739 666 surgical cases. We observed fairness drift in both the original and temporally updated models. Model updating had a larger impact on overall performance than fairness gaps. During periods of stable fairness, updating models at the population level increased, decreased, or did not impact fairness gaps. During periods of fairness drift, updating models restored fairness in some cases and exacerbated fairness gaps in others.</p><p><strong>Discussion: </strong>This exploratory study highlights that algorithmic fairness cannot be assured through one-time assessments during model development. Temporal changes in fairness may take multiple forms and interact with model updating strategies in unanticipated ways.</p><p><strong>Conclusion: </strong>Equitable and sustainable clinical artificial intelligence deployments will require novel methods to monitor algorithmic fairness, detect emerging bias, and adopt model updates that promote fairness.</p>\",\"PeriodicalId\":50016,\"journal\":{\"name\":\"Journal of the American Medical Informatics Association\",\"volume\":\" \",\"pages\":\"845-854\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012346/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Medical Informatics Association\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1093/jamia/ocaf039\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf039","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

目的:虽然临床预测模型的性能漂移已被充分证明,但部署后出现算法偏差的可能性有限。为了将公平性漂移纳入模型维护策略,需要更好地理解时间模型性能如何在亚种群中转移。材料和方法:我们研究了11年来全国人口的公平漂移,有和没有旨在维持人口水平表现的模型维护。我们使用2013年美国退伍军人事务部设施的数据训练随机森林模型,预测术后30天的再入院、死亡率和肺炎。从2014年到2023年,我们每季度评估一次自我报告的种族和性别。我们估计歧视、校准和准确性,并使用衡量弱势群体和优势群体之间差距的度量平价来实现公平性。结果:我们的队列包括1 739666例手术病例。我们在原始和临时更新的模型中都观察到公平漂移。模型更新对整体表现的影响大于公平差距。在稳定的公平时期,在人口水平上更新模型会增加、减少或不影响公平差距。在公平漂移期间,更新模型在某些情况下恢复了公平,在其他情况下加剧了公平差距。讨论:本探索性研究强调,不能通过模型开发过程中的一次性评估来保证算法的公平性。公平的时间变化可能采取多种形式,并以意想不到的方式与模型更新策略相互作用。结论:公平和可持续的临床人工智能部署将需要新的方法来监控算法的公平性,检测新出现的偏见,并采用促进公平性的模型更新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Emerging algorithmic bias: fairness drift as the next dimension of model maintenance and sustainability.

Objectives: While performance drift of clinical prediction models is well-documented, the potential for algorithmic biases to emerge post-deployment has had limited characterization. A better understanding of how temporal model performance may shift across subpopulations is required to incorporate fairness drift into model maintenance strategies.

Materials and methods: We explore fairness drift in a national population over 11 years, with and without model maintenance aimed at sustaining population-level performance. We trained random forest models predicting 30-day post-surgical readmission, mortality, and pneumonia using 2013 data from US Department of Veterans Affairs facilities. We evaluated performance quarterly from 2014 to 2023 by self-reported race and sex. We estimated discrimination, calibration, and accuracy, and operationalized fairness using metric parity measured as the gap between disadvantaged and advantaged groups.

Results: Our cohort included 1 739 666 surgical cases. We observed fairness drift in both the original and temporally updated models. Model updating had a larger impact on overall performance than fairness gaps. During periods of stable fairness, updating models at the population level increased, decreased, or did not impact fairness gaps. During periods of fairness drift, updating models restored fairness in some cases and exacerbated fairness gaps in others.

Discussion: This exploratory study highlights that algorithmic fairness cannot be assured through one-time assessments during model development. Temporal changes in fairness may take multiple forms and interact with model updating strategies in unanticipated ways.

Conclusion: Equitable and sustainable clinical artificial intelligence deployments will require novel methods to monitor algorithmic fairness, detect emerging bias, and adopt model updates that promote fairness.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of the American Medical Informatics Association
Journal of the American Medical Informatics Association 医学-计算机:跨学科应用
CiteScore
14.50
自引率
7.80%
发文量
230
审稿时长
3-8 weeks
期刊介绍: JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信