Huan Tian;Bo Liu;Tianqing Zhu;Wanlei Zhou;Philip S. Yu
{"title":"Distilling Fair Representations From Fair Teachers","authors":"Huan Tian;Bo Liu;Tianqing Zhu;Wanlei Zhou;Philip S. Yu","doi":"10.1109/TBDATA.2024.3460532","DOIUrl":null,"url":null,"abstract":"As an increasing number of data-driven deep learning models are deployed in our daily lives, the issue of algorithmic fairness has become a major concern. These models are trained on data that inevitably contains various biases, leading them to learn unfair representations that differ across demographic subgroups, resulting in unfair predictions. Previous work on fairness has attempted to remove subgroup information from learned features, aiming to contribute to similar representations across subgroups and lead to fairer predictions. However, identifying and removing this information is extremely challenging due to the “black box” nature of neural networks. Moreover, removing desired features without affecting other features is difficult, as features are often correlated, potentially harming model prediction performance. This paper aims to learn fair representations without degrading model prediction performance. We adopt knowledge distillation, allowing unfair models to learn fair representations directly from a fair teacher. The proposed method provides a novel approach to obtaining fair representations while maintaining valid prediction performance. We evaluate the proposed method, FairDistill, on four datasets (CIFAR-10, UTKFace, CelebA, and Adult) under diverse settings. Extensive experiments demonstrate the effectiveness and robustness of the proposed method.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1419-1433"},"PeriodicalIF":7.5000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10679895/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
As an increasing number of data-driven deep learning models are deployed in our daily lives, the issue of algorithmic fairness has become a major concern. These models are trained on data that inevitably contains various biases, leading them to learn unfair representations that differ across demographic subgroups, resulting in unfair predictions. Previous work on fairness has attempted to remove subgroup information from learned features, aiming to contribute to similar representations across subgroups and lead to fairer predictions. However, identifying and removing this information is extremely challenging due to the “black box” nature of neural networks. Moreover, removing desired features without affecting other features is difficult, as features are often correlated, potentially harming model prediction performance. This paper aims to learn fair representations without degrading model prediction performance. We adopt knowledge distillation, allowing unfair models to learn fair representations directly from a fair teacher. The proposed method provides a novel approach to obtaining fair representations while maintaining valid prediction performance. We evaluate the proposed method, FairDistill, on four datasets (CIFAR-10, UTKFace, CelebA, and Adult) under diverse settings. Extensive experiments demonstrate the effectiveness and robustness of the proposed method.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.