The Role of Accuracy in Algorithmic Process Fairness Across Multiple Domains

Proceedings of the 22nd ACM Conference on Economics and Computation Pub Date : 2021-07-18 DOI:10.1145/3465456.3467620

M. Albach, J. R. Wright

{"title":"The Role of Accuracy in Algorithmic Process Fairness Across Multiple Domains","authors":"M. Albach, J. R. Wright","doi":"10.1145/3465456.3467620","DOIUrl":null,"url":null,"abstract":"Machine learning is often used to aid in human decision-making, sometimes for life-altering decisions like when determining whether or not to grant bail to a defendant or a loan to an applicant. Because of their importance, it is critical to ensure that the processes used to reach these decisions are considered fair. A common approach is to enforce some fairness constraint over the outcomes of a decision maker, but there is no single, generally-accepted definition of fairness. With notable exceptions, most of the literature on algorithmic fairness takes for granted that there will be an inherent trade-off between accuracy and algorithmic fairness. Additionally, most work focuses only on one or two domains, whereas machine learning techniques are used in an increasing number of distinct decision-making contexts with differing pertinent features. In this work, we consider six different decision-making domains: bail, child protective services, hospital resources, insurance rates, loans, and unemployment aid. We focus on the fairness of the process directly, rather than the outcomes. We also take a descriptive approach, using survey data to elicit the factors that lead a decision-making process to be perceived as fair. Specifically, we ask 2157 Amazon Mechanical Turk workers to rate the features used for algorithmic decision-making in one of the six domains as either fair or unfair, as well as to rate how much they agree or disagree with the assignments of eight previously (and one newly) proposed properties to the features. For example, a worker could be asked to rate the feature of \"criminal history\" as fair or unfair to use in bail decisions, and then rate how much they agree or disagree that \"criminal history\" is a reliable feature. We show that, in every domain, disagreements in fairness judgements can be largely explained by the assignments of properties (like reliability) to features (like criminal history). We also show that fairness judgements can be well predicted across domains by training the predictor using the property assignments from one domain's data and predicting in another. These findings imply that the properties act as moral determinants for fairness judgements, and that respondents reason similarly about the implications of the properties in all the decision-making domains that we consider. Although our results are mostly consistent across domains, we find some important differences within specific demographic groups in the hospital and insurance domains, indicating that at least some differences in fairness judgements are introduced by demographic differences. However, a single property usually holds the majority of the predictive power. With some exceptions, predictors learning from only the \"increases accuracy\" property perform better (in all domains) than predictors learning from any combination of the other seven properties, implying that the primary factor affecting respondents' perceptions of the fairness of using a feature for prediction is whether or not a feature increases the accuracy of the decision being made.","PeriodicalId":395676,"journal":{"name":"Proceedings of the 22nd ACM Conference on Economics and Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM Conference on Economics and Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3465456.3467620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Machine learning is often used to aid in human decision-making, sometimes for life-altering decisions like when determining whether or not to grant bail to a defendant or a loan to an applicant. Because of their importance, it is critical to ensure that the processes used to reach these decisions are considered fair. A common approach is to enforce some fairness constraint over the outcomes of a decision maker, but there is no single, generally-accepted definition of fairness. With notable exceptions, most of the literature on algorithmic fairness takes for granted that there will be an inherent trade-off between accuracy and algorithmic fairness. Additionally, most work focuses only on one or two domains, whereas machine learning techniques are used in an increasing number of distinct decision-making contexts with differing pertinent features. In this work, we consider six different decision-making domains: bail, child protective services, hospital resources, insurance rates, loans, and unemployment aid. We focus on the fairness of the process directly, rather than the outcomes. We also take a descriptive approach, using survey data to elicit the factors that lead a decision-making process to be perceived as fair. Specifically, we ask 2157 Amazon Mechanical Turk workers to rate the features used for algorithmic decision-making in one of the six domains as either fair or unfair, as well as to rate how much they agree or disagree with the assignments of eight previously (and one newly) proposed properties to the features. For example, a worker could be asked to rate the feature of "criminal history" as fair or unfair to use in bail decisions, and then rate how much they agree or disagree that "criminal history" is a reliable feature. We show that, in every domain, disagreements in fairness judgements can be largely explained by the assignments of properties (like reliability) to features (like criminal history). We also show that fairness judgements can be well predicted across domains by training the predictor using the property assignments from one domain's data and predicting in another. These findings imply that the properties act as moral determinants for fairness judgements, and that respondents reason similarly about the implications of the properties in all the decision-making domains that we consider. Although our results are mostly consistent across domains, we find some important differences within specific demographic groups in the hospital and insurance domains, indicating that at least some differences in fairness judgements are introduced by demographic differences. However, a single property usually holds the majority of the predictive power. With some exceptions, predictors learning from only the "increases accuracy" property perform better (in all domains) than predictors learning from any combination of the other seven properties, implying that the primary factor affecting respondents' perceptions of the fairness of using a feature for prediction is whether or not a feature increases the accuracy of the decision being made.

查看原文本刊更多论文

准确性在多领域算法过程公平性中的作用

机器学习通常用于帮助人类决策，有时用于改变生活的决定，例如决定是否向被告提供保释或向申请人提供贷款。由于它们的重要性，确保用来作出这些决定的过程被认为是公平的是至关重要的。一种常见的方法是对决策者的结果施加一些公平约束，但是没有单一的、被普遍接受的公平定义。除了明显的例外，大多数关于算法公平性的文献都理所当然地认为，在准确性和算法公平性之间存在固有的权衡。此外，大多数工作只关注一两个领域，而机器学习技术被用于越来越多具有不同相关特征的不同决策环境中。在这项工作中，我们考虑了六个不同的决策领域:保释、儿童保护服务、医院资源、保险费率、贷款和失业援助。我们直接关注过程的公平性，而不是结果。我们还采用描述性方法，使用调查数据来引出导致决策过程被认为是公平的因素。具体来说，我们要求2157名亚马逊机械土耳其工人对六个领域中用于算法决策的特征进行公平或不公平的评分，并对他们对之前提出的八个(和一个新提出的)特征属性的分配表示同意或不同意的程度进行评分。例如，一名员工可能会被要求对“犯罪历史”的特征进行公平或不公平的评估，以用于保释决定，然后评估他们在多大程度上同意或不同意“犯罪历史”是一个可靠的特征。我们表明，在每个领域，公平判断的分歧可以在很大程度上通过属性(如可靠性)对特征(如犯罪历史)的分配来解释。我们还表明，通过使用来自一个领域的数据的属性分配来训练预测器，并在另一个领域进行预测，可以很好地预测公平性判断。这些发现表明，这些属性是公平判断的道德决定因素，受访者对我们所考虑的所有决策领域中这些属性的含义进行了类似的推理。尽管我们的结果在各个领域基本一致，但我们发现在医院和保险领域的特定人口群体中存在一些重要差异，这表明至少有一些公平判断的差异是由人口差异引起的。然而，单个属性通常拥有大部分的预测能力。除了一些例外，仅从“提高准确性”属性中学习的预测者(在所有领域)比从其他七个属性的任何组合中学习的预测者表现得更好，这意味着影响受访者对使用一个特征进行预测的公平性的看法的主要因素是一个特征是否提高了所做决策的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 22nd ACM Conference on Economics and Computation

自引率

0.00%

发文量