人脸验证系统中的量化偏差

AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD) Pub Date : 2022-04-20 DOI:10.3390/cmsf2022003006

Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo

{"title":"人脸验证系统中的量化偏差","authors":"Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo","doi":"10.3390/cmsf2022003006","DOIUrl":null,"url":null,"abstract":": Machine learning models perform face veriﬁcation (FV) for a variety of highly consequential applications, such as biometric authentication, face identiﬁcation, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-speciﬁc performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness deﬁnitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-deﬁned, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous ﬁndings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be signiﬁcant ( p < 0.05).","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Quantifying Bias in a Face Verification System\",\"authors\":\"Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo\",\"doi\":\"10.3390/cmsf2022003006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Machine learning models perform face veriﬁcation (FV) for a variety of highly consequential applications, such as biometric authentication, face identiﬁcation, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-speciﬁc performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness deﬁnitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-deﬁned, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous ﬁndings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be signiﬁcant ( p < 0.05).\",\"PeriodicalId\":127261,\"journal\":{\"name\":\"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/cmsf2022003006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/cmsf2022003006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

机器学习模型执行人脸验证(FV)的各种高度重要的应用，如生物识别认证，人脸识别和监控。许多最先进的FV系统在不同人口群体中表现不平等，这一点通常被不评估特定人群表现的评估措施所忽视。部署有偏见的系统可能会对表现不佳的个人或群体造成严重伤害。我们探讨了几个公平的定义和指标，试图量化谷歌的FaceNet模型中的偏见。除了统计公平性指标外，我们还分析了由FV模型产生的聚类人脸嵌入。我们将人口统计群体的良好聚类嵌入(定义良好的密集聚类)与针对该群体的有偏差模型性能联系起来。我们提出的直觉是，FV系统在受保护的人口群体上表现不佳，因为它们对这些群体内部特征之间的差异不太敏感，聚类嵌入证明了这一点。我们展示了这种性能差异是如何由表示和聚集偏差共同造成的。白色面孔嵌入的死亡时间比其他种族组晚(W × A、W × I和W × B -t检验p < 0.05)，说明白色面孔嵌入在嵌入空间中更多。其他种族群体的死亡高峰时间比白种人群体更高更早。白种人亚组的峰值更短更宽，这意味着h0死亡时间的变化更大(方差更大)，而不是其他种族群体在0.8左右的一致峰值，方差更小。这表明，与其他种族相比，白人面孔分布在嵌入空间中的方差更大，这一趋势在种族群体的质心距离分布中没有出现，呈现出四个钟形密度图。因此，我们对(H 0)死亡时间的分析支持了之前的发现，即白种人群体与其他种族群体的聚集方式不同。我们注意到，尽管我们的p值表明这种差异可能是显著的(p < 0.05)，但女性与男性面孔在H 0死亡时间上的不平等程度较小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Quantifying Bias in a Face Verification System

: Machine learning models perform face veriﬁcation (FV) for a variety of highly consequential applications, such as biometric authentication, face identiﬁcation, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-speciﬁc performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness deﬁnitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-deﬁned, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous ﬁndings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be signiﬁcant ( p < 0.05).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)

自引率

0.00%

发文量