提高胸部x线分类器的公平性

Proceedings of the ACM Conference on Health, Inference, and Learning Pub Date : 2022-03-23 DOI:10.48550/arXiv.2203.12609

Haoran Zhang, Natalie Dullerud, Karsten Roth, Lauren Oakden-Rayner, S. Pfohl, M. Ghassemi

{"title":"提高胸部x线分类器的公平性","authors":"Haoran Zhang, Natalie Dullerud, Karsten Roth, Lauren Oakden-Rayner, S. Pfohl, M. Ghassemi","doi":"10.48550/arXiv.2203.12609","DOIUrl":null,"url":null,"abstract":"Deep learning models have reached or surpassed human-level performance in the field of medical imaging, especially in disease diagnosis using chest x-rays. However, prior work has found that such classifiers can exhibit biases in the form of gaps in predictive performance across protected groups. In this paper, we question whether striving to achieve zero disparities in predictive performance (i.e. group fairness) is the appropriate fairness definition in the clinical setting, over minimax fairness, which focuses on maximizing the performance of the worst-case group. We benchmark the performance of nine methods in improving classifier fairness across these two definitions. We find, consistent with prior work on non-clinical data, that methods which strive to achieve better worst-group performance do not outperform simple data balancing. We also find that methods which achieve group fairness do so by worsening performance for all groups. In light of these results, we discuss the utility of fairness definitions in the clinical setting, advocating for an investigation of the bias-inducing mechanisms in the underlying data generating process whenever possible.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"51 1","pages":"204-233"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Improving the Fairness of Chest X-ray Classifiers\",\"authors\":\"Haoran Zhang, Natalie Dullerud, Karsten Roth, Lauren Oakden-Rayner, S. Pfohl, M. Ghassemi\",\"doi\":\"10.48550/arXiv.2203.12609\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning models have reached or surpassed human-level performance in the field of medical imaging, especially in disease diagnosis using chest x-rays. However, prior work has found that such classifiers can exhibit biases in the form of gaps in predictive performance across protected groups. In this paper, we question whether striving to achieve zero disparities in predictive performance (i.e. group fairness) is the appropriate fairness definition in the clinical setting, over minimax fairness, which focuses on maximizing the performance of the worst-case group. We benchmark the performance of nine methods in improving classifier fairness across these two definitions. We find, consistent with prior work on non-clinical data, that methods which strive to achieve better worst-group performance do not outperform simple data balancing. We also find that methods which achieve group fairness do so by worsening performance for all groups. In light of these results, we discuss the utility of fairness definitions in the clinical setting, advocating for an investigation of the bias-inducing mechanisms in the underlying data generating process whenever possible.\",\"PeriodicalId\":87342,\"journal\":{\"name\":\"Proceedings of the ACM Conference on Health, Inference, and Learning\",\"volume\":\"51 1\",\"pages\":\"204-233\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Conference on Health, Inference, and Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2203.12609\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Conference on Health, Inference, and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2203.12609","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

摘要

深度学习模型在医学成像领域的表现已经达到或超过了人类水平，特别是在使用胸部x光进行疾病诊断方面。然而，先前的研究发现，这种分类器可能会在受保护的群体中表现出预测性能差距的形式。在本文中，我们质疑在临床环境中，努力实现预测绩效的零差异(即群体公平)是否合适的公平定义，而不是最小化最大公平，其重点是最大化最差情况组的绩效。我们对这两种定义中提高分类器公平性的九种方法的性能进行了基准测试。我们发现，与之前在非临床数据上的工作一致，努力实现更好的最差组性能的方法并不优于简单的数据平衡。我们还发现，实现群体公平的方法是通过降低所有群体的表现来实现的。鉴于这些结果，我们讨论了公平性定义在临床环境中的效用，主张尽可能调查潜在数据生成过程中的偏见诱发机制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving the Fairness of Chest X-ray Classifiers

Deep learning models have reached or surpassed human-level performance in the field of medical imaging, especially in disease diagnosis using chest x-rays. However, prior work has found that such classifiers can exhibit biases in the form of gaps in predictive performance across protected groups. In this paper, we question whether striving to achieve zero disparities in predictive performance (i.e. group fairness) is the appropriate fairness definition in the clinical setting, over minimax fairness, which focuses on maximizing the performance of the worst-case group. We benchmark the performance of nine methods in improving classifier fairness across these two definitions. We find, consistent with prior work on non-clinical data, that methods which strive to achieve better worst-group performance do not outperform simple data balancing. We also find that methods which achieve group fairness do so by worsening performance for all groups. In light of these results, we discuss the utility of fairness definitions in the clinical setting, advocating for an investigation of the bias-inducing mechanisms in the underlying data generating process whenever possible.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM Conference on Health, Inference, and Learning

自引率

0.00%

发文量