Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo
{"title":"人脸验证系统中的量化偏差","authors":"Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo","doi":"10.3390/cmsf2022003006","DOIUrl":null,"url":null,"abstract":": Machine learning models perform face verification (FV) for a variety of highly consequential applications, such as biometric authentication, face identification, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-specific performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness definitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-defined, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous findings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be significant ( p < 0.05).","PeriodicalId":127261,"journal":{"name":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Quantifying Bias in a Face Verification System\",\"authors\":\"Megan Frisella, Pooya Khorrami, J. Matterer, K. Kratkiewicz, P. Torres-Carrasquillo\",\"doi\":\"10.3390/cmsf2022003006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Machine learning models perform face verification (FV) for a variety of highly consequential applications, such as biometric authentication, face identification, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-specific performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness definitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-defined, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous findings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be significant ( p < 0.05).\",\"PeriodicalId\":127261,\"journal\":{\"name\":\"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/cmsf2022003006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/cmsf2022003006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
: Machine learning models perform face verification (FV) for a variety of highly consequential applications, such as biometric authentication, face identification, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-specific performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness definitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-defined, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous findings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be significant ( p < 0.05).