DSAP:通过数据集的人口统计学比较分析偏差

IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Iris Dominguez-Catena, Daniel Paternain, Mikel Galar
{"title":"DSAP:通过数据集的人口统计学比较分析偏差","authors":"Iris Dominguez-Catena,&nbsp;Daniel Paternain,&nbsp;Mikel Galar","doi":"10.1016/j.inffus.2024.102760","DOIUrl":null,"url":null,"abstract":"<div><div>In the last few years, Artificial Intelligence (AI) systems have become increasingly widespread. Unfortunately, these systems can share many biases with human decision-making, including demographic biases. Often, these biases can be traced back to the data used for training, where large uncurated datasets have become the norm. Despite our awareness of these biases, we still lack general tools to detect, quantify, and compare them across different datasets. In this work, we propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of datasets. First, DSAP uses existing demographic estimation models to extract a dataset’s demographic profile. Second, it applies a similarity metric to compare the demographic profiles of different datasets. While these individual components are well-known, their joint use for demographic dataset comparison is novel and has not been previously addressed in the literature. This approach allows three key applications: the identification of demographic blind spots and bias issues across datasets, the measurement of demographic bias, and the assessment of demographic shifts over time. DSAP can be used on datasets with or without explicit demographic information, provided that demographic information can be derived from the samples using auxiliary models, such as those for image or voice datasets. To show the usefulness of the proposed methodology, we consider the Facial Expression Recognition task, where demographic bias has previously been found. The three applications are studied over a set of twenty datasets with varying properties. The code is available at <span><span>https://github.com/irisdominguez/DSAP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"115 ","pages":"Article 102760"},"PeriodicalIF":14.7000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DSAP: Analyzing bias through demographic comparison of datasets\",\"authors\":\"Iris Dominguez-Catena,&nbsp;Daniel Paternain,&nbsp;Mikel Galar\",\"doi\":\"10.1016/j.inffus.2024.102760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the last few years, Artificial Intelligence (AI) systems have become increasingly widespread. Unfortunately, these systems can share many biases with human decision-making, including demographic biases. Often, these biases can be traced back to the data used for training, where large uncurated datasets have become the norm. Despite our awareness of these biases, we still lack general tools to detect, quantify, and compare them across different datasets. In this work, we propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of datasets. First, DSAP uses existing demographic estimation models to extract a dataset’s demographic profile. Second, it applies a similarity metric to compare the demographic profiles of different datasets. While these individual components are well-known, their joint use for demographic dataset comparison is novel and has not been previously addressed in the literature. This approach allows three key applications: the identification of demographic blind spots and bias issues across datasets, the measurement of demographic bias, and the assessment of demographic shifts over time. DSAP can be used on datasets with or without explicit demographic information, provided that demographic information can be derived from the samples using auxiliary models, such as those for image or voice datasets. To show the usefulness of the proposed methodology, we consider the Facial Expression Recognition task, where demographic bias has previously been found. The three applications are studied over a set of twenty datasets with varying properties. The code is available at <span><span>https://github.com/irisdominguez/DSAP</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"115 \",\"pages\":\"Article 102760\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253524005384\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005384","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

最近几年,人工智能(AI)系统日益普及。不幸的是,这些系统可能与人类决策存在许多共同的偏差,包括人口统计偏差。通常情况下,这些偏差可以追溯到用于训练的数据,未经整理的大型数据集已成为常态。尽管我们意识到了这些偏差,但仍然缺乏通用工具来检测、量化和比较不同数据集上的偏差。在这项工作中,我们提出了 DSAP(来自辅助档案的人口统计学相似性),这是一种分两步比较数据集人口组成的方法。首先,DSAP 使用现有的人口估计模型来提取数据集的人口特征。其次,它应用相似度量来比较不同数据集的人口构成。虽然这些单独的组成部分是众所周知的,但将它们联合起来用于人口数据集比较却是新颖的,以前的文献中从未涉及过。这种方法有三个关键应用:识别不同数据集的人口盲点和偏差问题、测量人口偏差以及评估人口随时间的变化。DSAP 可用于具有或不具有明确人口统计信息的数据集,前提是人口统计信息可通过辅助模型(如图像或语音数据集的辅助模型)从样本中导出。为了展示所提方法的实用性,我们考虑了面部表情识别任务,在该任务中,以前曾发现过人口统计学偏差。我们对 20 个不同属性的数据集进行了研究。代码见 https://github.com/irisdominguez/DSAP。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DSAP: Analyzing bias through demographic comparison of datasets
In the last few years, Artificial Intelligence (AI) systems have become increasingly widespread. Unfortunately, these systems can share many biases with human decision-making, including demographic biases. Often, these biases can be traced back to the data used for training, where large uncurated datasets have become the norm. Despite our awareness of these biases, we still lack general tools to detect, quantify, and compare them across different datasets. In this work, we propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of datasets. First, DSAP uses existing demographic estimation models to extract a dataset’s demographic profile. Second, it applies a similarity metric to compare the demographic profiles of different datasets. While these individual components are well-known, their joint use for demographic dataset comparison is novel and has not been previously addressed in the literature. This approach allows three key applications: the identification of demographic blind spots and bias issues across datasets, the measurement of demographic bias, and the assessment of demographic shifts over time. DSAP can be used on datasets with or without explicit demographic information, provided that demographic information can be derived from the samples using auxiliary models, such as those for image or voice datasets. To show the usefulness of the proposed methodology, we consider the Facial Expression Recognition task, where demographic bias has previously been found. The three applications are studied over a set of twenty datasets with varying properties. The code is available at https://github.com/irisdominguez/DSAP.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信