Evaluating Proposed Fairness Models for Face Recognition Algorithms

ICPR Workshops Pub Date : 2022-03-09 DOI:10.48550/arXiv.2203.05051

John J. Howard, Eli J. Laird, Yevgeniy B. Sirotin, Rebecca E. Rubin, Jerry L. Tipton, A. Vemury

{"title":"Evaluating Proposed Fairness Models for Face Recognition Algorithms","authors":"John J. Howard, Eli J. Laird, Yevgeniy B. Sirotin, Rebecca E. Rubin, Jerry L. Tipton, A. Vemury","doi":"10.48550/arXiv.2203.05051","DOIUrl":null,"url":null,"abstract":"The development of face recognition algorithms by academic and commercial organizations is growing rapidly due to the onset of deep learning and the widespread availability of training data. Though tests of face recognition algorithm performance indicate yearly performance gains, error rates for many of these systems differ based on the demographic composition of the test set. These\"demographic differentials\"in algorithm performance can contribute to unequal or unfair outcomes for certain groups of people, raising concerns with increased worldwide adoption of face recognition systems. Consequently, regulatory bodies in both the United States and Europe have proposed new rules requiring audits of biometric systems for\"discriminatory impacts\"(European Union Artificial Intelligence Act) and\"fairness\"(U.S. Federal Trade Commission). However, no standard for measuring fairness in biometric systems yet exists. This paper characterizes two proposed measures of face recognition algorithm fairness (fairness measures) from scientists in the U.S. and Europe. We find that both proposed methods are challenging to interpret when applied to disaggregated face recognition error rates as they are commonly experienced in practice. To address this, we propose a set of interpretability criteria, termed the Functional Fairness Measure Criteria (FFMC), that outlines a set of properties desirable in a face recognition algorithm fairness measure. We further develop a new fairness measure, the Gini Aggregation Rate for Biometric Equitability (GARBE), and show how, in conjunction with the Pareto optimization, this measure can be used to select among alternative algorithms based on the accuracy/fairness trade-space. Finally, we have open-sourced our dataset of machine-readable, demographically disaggregated error rates. We believe this is currently the largest open-source dataset of its kind.","PeriodicalId":391161,"journal":{"name":"ICPR Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICPR Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2203.05051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

The development of face recognition algorithms by academic and commercial organizations is growing rapidly due to the onset of deep learning and the widespread availability of training data. Though tests of face recognition algorithm performance indicate yearly performance gains, error rates for many of these systems differ based on the demographic composition of the test set. These"demographic differentials"in algorithm performance can contribute to unequal or unfair outcomes for certain groups of people, raising concerns with increased worldwide adoption of face recognition systems. Consequently, regulatory bodies in both the United States and Europe have proposed new rules requiring audits of biometric systems for"discriminatory impacts"(European Union Artificial Intelligence Act) and"fairness"(U.S. Federal Trade Commission). However, no standard for measuring fairness in biometric systems yet exists. This paper characterizes two proposed measures of face recognition algorithm fairness (fairness measures) from scientists in the U.S. and Europe. We find that both proposed methods are challenging to interpret when applied to disaggregated face recognition error rates as they are commonly experienced in practice. To address this, we propose a set of interpretability criteria, termed the Functional Fairness Measure Criteria (FFMC), that outlines a set of properties desirable in a face recognition algorithm fairness measure. We further develop a new fairness measure, the Gini Aggregation Rate for Biometric Equitability (GARBE), and show how, in conjunction with the Pareto optimization, this measure can be used to select among alternative algorithms based on the accuracy/fairness trade-space. Finally, we have open-sourced our dataset of machine-readable, demographically disaggregated error rates. We believe this is currently the largest open-source dataset of its kind.

查看原文本刊更多论文

评价人脸识别算法的公平性模型

由于深度学习的兴起和训练数据的广泛可用性，学术和商业组织对人脸识别算法的开发正在迅速增长。尽管人脸识别算法性能的测试表明每年的性能都在提高，但这些系统的错误率根据测试集的人口组成而有所不同。算法性能上的这些“人口统计学差异”可能导致某些人群的结果不平等或不公平，这引起了人们对全球越来越多地采用人脸识别系统的担忧。因此，美国和欧洲的监管机构都提出了新的规则，要求对生物识别系统的“歧视性影响”(《欧盟人工智能法案》)和“公平性”(《美国人工智能法案》)进行审计联邦贸易委员会)。然而，目前还没有衡量生物识别系统公平性的标准。本文介绍了美国和欧洲科学家提出的两种人脸识别算法公平性度量(公平性度量)。我们发现，当应用于分解的人脸识别错误率时，这两种方法都具有挑战性，因为它们在实践中很常见。为了解决这个问题，我们提出了一套可解释性标准，称为功能公平衡量标准(FFMC)，该标准概述了人脸识别算法公平衡量所需的一组属性。我们进一步开发了一种新的公平度量，即生物特征公平性的基尼聚集率(GARBE)，并展示了如何与帕累托优化相结合，使用该度量在基于准确性/公平性交易空间的替代算法中进行选择。最后，我们开源了机器可读的、按人口统计学分类的错误率数据集。我们相信这是目前同类中最大的开源数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICPR Workshops

自引率

0.00%

发文量