Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments

Proceedings of the Conference on Fairness, Accountability, and Transparency Pub Date : 2019-01-29 DOI:10.1145/3287560.3287563

Ben Green, Yiling Chen

{"title":"Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments","authors":"Ben Green, Yiling Chen","doi":"10.1145/3287560.3287563","DOIUrl":null,"url":null,"abstract":"Despite vigorous debates about the technical characteristics of risk assessments being deployed in the U.S. criminal justice system, remarkably little research has studied how these tools affect actual decision-making processes. After all, risk assessments do not make definitive decisions---they inform judges, who are the final arbiters. It is therefore essential that considerations of risk assessments be informed by rigorous studies of how judges actually interpret and use them. This paper takes a first step toward such research on human interactions with risk assessments through a controlled experimental study on Amazon Mechanical Turk. We found several behaviors that call into question the supposed efficacy and fairness of risk assessments: our study participants 1) underperformed the risk assessment even when presented with its predictions, 2) could not effectively evaluate the accuracy of their own or the risk assessment's predictions, and 3) exhibited behaviors fraught with \"disparate interactions,\" whereby the use of risk assessments led to higher risk predictions about black defendants and lower risk predictions about white defendants. These results suggest the need for a new \"algorithm-in-the-loop\" framework that places machine learning decision-making aids into the sociotechnical context of improving human decisions rather than the technical context of generating the best prediction in the abstract. If risk assessments are to be used at all, they must be grounded in rigorous evaluations of their real-world impacts instead of in their theoretical potential.","PeriodicalId":20573,"journal":{"name":"Proceedings of the Conference on Fairness, Accountability, and Transparency","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"200","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Fairness, Accountability, and Transparency","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3287560.3287563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 200

Abstract

Despite vigorous debates about the technical characteristics of risk assessments being deployed in the U.S. criminal justice system, remarkably little research has studied how these tools affect actual decision-making processes. After all, risk assessments do not make definitive decisions---they inform judges, who are the final arbiters. It is therefore essential that considerations of risk assessments be informed by rigorous studies of how judges actually interpret and use them. This paper takes a first step toward such research on human interactions with risk assessments through a controlled experimental study on Amazon Mechanical Turk. We found several behaviors that call into question the supposed efficacy and fairness of risk assessments: our study participants 1) underperformed the risk assessment even when presented with its predictions, 2) could not effectively evaluate the accuracy of their own or the risk assessment's predictions, and 3) exhibited behaviors fraught with "disparate interactions," whereby the use of risk assessments led to higher risk predictions about black defendants and lower risk predictions about white defendants. These results suggest the need for a new "algorithm-in-the-loop" framework that places machine learning decision-making aids into the sociotechnical context of improving human decisions rather than the technical context of generating the best prediction in the abstract. If risk assessments are to be used at all, they must be grounded in rigorous evaluations of their real-world impacts instead of in their theoretical potential.

查看原文本刊更多论文

不同的相互作用:风险评估公平性的循环算法分析

尽管对美国刑事司法系统中部署的风险评估的技术特征进行了激烈的辩论，但关于这些工具如何影响实际决策过程的研究却非常少。毕竟，风险评估并不能做出决定性的决定——它们只是告知作为最终仲裁者的法官。因此，必须通过对法官如何实际解释和使用风险评估的严格研究，为风险评估的考虑提供信息。本文通过对Amazon Mechanical Turk的对照实验研究，迈出了人类互动与风险评估研究的第一步。我们发现了一些行为，这些行为对风险评估的有效性和公平性提出了质疑:我们的研究参与者1)即使提供了风险评估的预测，他们的表现也不佳;2)不能有效地评估他们自己或风险评估预测的准确性;3)表现出充满“不同的相互作用”的行为，即使用风险评估导致对黑人被告的风险预测较高，对白人被告的风险预测较低。这些结果表明，需要一个新的“循环算法”框架，将机器学习决策辅助工具置于改善人类决策的社会技术背景中，而不是在抽象中生成最佳预测的技术背景中。如果要使用风险评估，它们必须基于对其现实影响的严格评估，而不是基于其理论潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Conference on Fairness, Accountability, and Transparency

自引率

0.00%

发文量