Analyzing Fairness in Deepfake Detection With Massively Annotated Databases

IEEE transactions on technology and society Pub Date : 2024-02-16 DOI:10.1109/TTS.2024.3365421

Ying Xu;Philipp Terhörst;Marius Pedersen;Kiran Raja

{"title":"Analyzing Fairness in Deepfake Detection With Massively Annotated Databases","authors":"Ying Xu;Philipp Terhörst;Marius Pedersen;Kiran Raja","doi":"10.1109/TTS.2024.3365421","DOIUrl":null,"url":null,"abstract":"In recent years, image and video manipulations with Deepfake have become a severe concern for security and society. Many detection models and datasets have been proposed to detect Deepfake data reliably. However, there is an increased concern that these models and training databases might be biased and, thus, cause Deepfake detectors to fail. In this work, we investigate factors causing biased detection in public Deepfake datasets by (a) creating large-scale demographic and non-demographic attribute annotations with 47 different attributes for five popular Deepfake datasets and (b) comprehensively analysing attributes resulting in AI-bias of three state-of-the-art Deepfake detection backbone models on these datasets. The analysis shows how various attributes influence a large variety of distinctive attributes (from over 65M labels) on the detection performance which includes demographic (age, gender, ethnicity) and non-demographic (hair, skin, accessories, etc.) attributes. The results examined datasets show limited diversity and, more importantly, show that the utilised Deepfake detection backbone models are strongly affected by investigated attributes making them not fair across attributes. The Deepfake detection backbone methods trained on such imbalanced/biased datasets result in incorrect detection results leading to generalisability, fairness, and security issues. Our findings and annotated datasets will guide future research to evaluate and mitigate bias in Deepfake detection techniques. The annotated datasets and the corresponding code are publicly available. The code link is: \n<uri>https://github.com/xuyingzhongguo/DeepFakeAnnotations</uri>\n.","PeriodicalId":73324,"journal":{"name":"IEEE transactions on technology and society","volume":"5 1","pages":"93-106"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10438899","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on technology and society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10438899/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, image and video manipulations with Deepfake have become a severe concern for security and society. Many detection models and datasets have been proposed to detect Deepfake data reliably. However, there is an increased concern that these models and training databases might be biased and, thus, cause Deepfake detectors to fail. In this work, we investigate factors causing biased detection in public Deepfake datasets by (a) creating large-scale demographic and non-demographic attribute annotations with 47 different attributes for five popular Deepfake datasets and (b) comprehensively analysing attributes resulting in AI-bias of three state-of-the-art Deepfake detection backbone models on these datasets. The analysis shows how various attributes influence a large variety of distinctive attributes (from over 65M labels) on the detection performance which includes demographic (age, gender, ethnicity) and non-demographic (hair, skin, accessories, etc.) attributes. The results examined datasets show limited diversity and, more importantly, show that the utilised Deepfake detection backbone models are strongly affected by investigated attributes making them not fair across attributes. The Deepfake detection backbone methods trained on such imbalanced/biased datasets result in incorrect detection results leading to generalisability, fairness, and security issues. Our findings and annotated datasets will guide future research to evaluate and mitigate bias in Deepfake detection techniques. The annotated datasets and the corresponding code are publicly available. The code link is: https://github.com/xuyingzhongguo/DeepFakeAnnotations .

查看原文本刊更多论文

利用大规模注释数据库分析深度伪造检测的公平性

近年来，利用 Deepfake 对图像和视频进行篡改已成为安全和社会的严重问题。为了可靠地检测 Deepfake 数据，人们提出了许多检测模型和数据集。然而，人们越来越担心这些模型和训练数据库可能存在偏差，从而导致 Deepfake 检测器失效。在这项工作中，我们通过以下方法研究了造成公共 Deepfake 数据集检测偏差的因素：(a) 为五个流行的 Deepfake 数据集创建包含 47 种不同属性的大规模人口和非人口属性注释；(b) 全面分析导致三个最先进的 Deepfake 检测骨干模型在这些数据集上出现人工智能偏差的属性。分析表明了各种属性（来自超过 6500 万个标签）对检测性能的影响，其中包括人口学属性（年龄、性别、种族）和非人口学属性（头发、皮肤、配饰等）。检测数据集的结果显示出有限的多样性，更重要的是，这些数据集显示出所使用的深度假货检测骨干模型受到所调查属性的强烈影响，使得这些模型在不同属性之间并不公平。在这种不平衡/有偏见的数据集上训练的深度伪造检测主干方法会导致不正确的检测结果，从而引发普遍性、公平性和安全性问题。我们的发现和注释数据集将指导未来的研究，以评估和减轻 Deepfake 检测技术中的偏差。注释数据集和相应代码可公开获取。代码链接为：https://github.com/xuyingzhongguo/DeepFakeAnnotations。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on technology and society

自引率

0.00%

发文量