{"title":"公平干预的现实表现——为公平ML引入一个新的基准数据集","authors":"Daphne Lenders, T. Calders","doi":"10.1145/3555776.3577634","DOIUrl":null,"url":null,"abstract":"Some researchers evaluate their fair Machine Learning (ML) algorithms by simulating data with a fair and biased version of its labels. The fair labels reflect what labels individuals deserve, while the biased labels reflect labels obtained through a biased decision process. Given such data, fair algorithms are evaluated by measuring how well they can predict the fair labels, after being trained on the biased ones. The big problem with these approaches is, that they are based on simulated data, which is unlikely to capture the full complexity and noise of real-life decision problems. In this paper, we show how we created a new, more realistic dataset with both fair and biased labels. For this purpose, we started with an existing dataset containing information about high school students and whether they passed an exam or not. Through a human experiment, where participants estimated the school performance given some description of these students, we collect a biased version of these labels. We show how this new dataset can be used to evaluate fair ML algorithms, and how some fairness interventions, that perform well in the traditional evaluation schemes, do not necessarily perform well with respect to the unbiased labels in our dataset, leading to new insights into the performance of debiasing techniques.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Real-life Performance of Fairness Interventions - Introducing A New Benchmarking Dataset for Fair ML\",\"authors\":\"Daphne Lenders, T. Calders\",\"doi\":\"10.1145/3555776.3577634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Some researchers evaluate their fair Machine Learning (ML) algorithms by simulating data with a fair and biased version of its labels. The fair labels reflect what labels individuals deserve, while the biased labels reflect labels obtained through a biased decision process. Given such data, fair algorithms are evaluated by measuring how well they can predict the fair labels, after being trained on the biased ones. The big problem with these approaches is, that they are based on simulated data, which is unlikely to capture the full complexity and noise of real-life decision problems. In this paper, we show how we created a new, more realistic dataset with both fair and biased labels. For this purpose, we started with an existing dataset containing information about high school students and whether they passed an exam or not. Through a human experiment, where participants estimated the school performance given some description of these students, we collect a biased version of these labels. We show how this new dataset can be used to evaluate fair ML algorithms, and how some fairness interventions, that perform well in the traditional evaluation schemes, do not necessarily perform well with respect to the unbiased labels in our dataset, leading to new insights into the performance of debiasing techniques.\",\"PeriodicalId\":42971,\"journal\":{\"name\":\"Applied Computing Review\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2023-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing Review\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3555776.3577634\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555776.3577634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Real-life Performance of Fairness Interventions - Introducing A New Benchmarking Dataset for Fair ML
Some researchers evaluate their fair Machine Learning (ML) algorithms by simulating data with a fair and biased version of its labels. The fair labels reflect what labels individuals deserve, while the biased labels reflect labels obtained through a biased decision process. Given such data, fair algorithms are evaluated by measuring how well they can predict the fair labels, after being trained on the biased ones. The big problem with these approaches is, that they are based on simulated data, which is unlikely to capture the full complexity and noise of real-life decision problems. In this paper, we show how we created a new, more realistic dataset with both fair and biased labels. For this purpose, we started with an existing dataset containing information about high school students and whether they passed an exam or not. Through a human experiment, where participants estimated the school performance given some description of these students, we collect a biased version of these labels. We show how this new dataset can be used to evaluate fair ML algorithms, and how some fairness interventions, that perform well in the traditional evaluation schemes, do not necessarily perform well with respect to the unbiased labels in our dataset, leading to new insights into the performance of debiasing techniques.