E-FAIR-DB: Functional Dependencies to Discover Data Bias and Enhance Data Equity

IF 2.9 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Journal of Data and Information Quality Pub Date : 2022-08-04 DOI:10.1145/3552433

Fabio Azzalini, Chiara Criscuolo, L. Tanca

{"title":"E-FAIR-DB: Functional Dependencies to Discover Data Bias and Enhance Data Equity","authors":"Fabio Azzalini, Chiara Criscuolo, L. Tanca","doi":"10.1145/3552433","DOIUrl":null,"url":null,"abstract":"Decisions based on algorithms and systems generated from data have become essential tools that pervade all aspects of our daily lives; for these advances to be reliable, the results should be accurate but should also respect all the facets of data equity [11]. In this context, the concepts of Fairness and Diversity have become relevant topics of discussion within the field of Data Science Ethics and, in general, in Data Science. Although data equity is desirable, reconciling this property with accurate decision-making is a critical tradeoff, because applying a repair procedure to restore equity might modify the original data in such a way that the final decision is inaccurate w.r.t. the ultimate objective of the analysis. In this work, we propose E-FAIR-DB, a novel solution that, exploiting the notion of Functional Dependency—a type of data constraint—aims at restoring data equity by discovering and solving discrimination in datasets. The proposed solution is implemented as a pipeline that, first, mines functional dependencies to detect and evaluate fairness and diversity in the input dataset, and then, based on these understandings and on the objective of the data analysis, mitigates data bias, minimizing the number of modifications. Our tool can identify, through the mined dependencies, the attributes of the database that encompass discrimination (e.g., gender, ethnicity, or religion); then, based on these dependencies, it determines the smallest amount of data that must be added and/or removed to mitigate such bias. We evaluate our proposal both through theoretical considerations and experiments on two real-world datasets.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"9 1","pages":"1 - 26"},"PeriodicalIF":2.9000,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3552433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 2

Abstract

Decisions based on algorithms and systems generated from data have become essential tools that pervade all aspects of our daily lives; for these advances to be reliable, the results should be accurate but should also respect all the facets of data equity [11]. In this context, the concepts of Fairness and Diversity have become relevant topics of discussion within the field of Data Science Ethics and, in general, in Data Science. Although data equity is desirable, reconciling this property with accurate decision-making is a critical tradeoff, because applying a repair procedure to restore equity might modify the original data in such a way that the final decision is inaccurate w.r.t. the ultimate objective of the analysis. In this work, we propose E-FAIR-DB, a novel solution that, exploiting the notion of Functional Dependency—a type of data constraint—aims at restoring data equity by discovering and solving discrimination in datasets. The proposed solution is implemented as a pipeline that, first, mines functional dependencies to detect and evaluate fairness and diversity in the input dataset, and then, based on these understandings and on the objective of the data analysis, mitigates data bias, minimizing the number of modifications. Our tool can identify, through the mined dependencies, the attributes of the database that encompass discrimination (e.g., gender, ethnicity, or religion); then, based on these dependencies, it determines the smallest amount of data that must be added and/or removed to mitigate such bias. We evaluate our proposal both through theoretical considerations and experiments on two real-world datasets.

查看原文本刊更多论文

E-FAIR-DB:发现数据偏差和增强数据公平性的功能依赖关系

基于算法和数据生成的系统的决策已经成为渗透我们日常生活方方面面的重要工具;为了使这些进展可靠，结果应该准确，但也应该尊重数据公平的所有方面[11]。在这种背景下，公平性和多样性的概念已经成为数据科学伦理领域和数据科学领域讨论的相关主题。虽然数据公平是可取的，但是将这一属性与准确的决策相协调是一个关键的权衡，因为应用修复过程来恢复公平可能会以这样一种方式修改原始数据，从而使最终决策与分析的最终目标相比是不准确的。在这项工作中，我们提出了E-FAIR-DB，这是一种新颖的解决方案，利用功能依赖(一种数据约束)的概念，旨在通过发现和解决数据集中的歧视来恢复数据公平。提出的解决方案是作为一个管道实现的，首先，挖掘功能依赖关系以检测和评估输入数据集的公平性和多样性，然后，基于这些理解和数据分析的目标，减轻数据偏差，最大限度地减少修改次数。我们的工具可以通过挖掘的依赖关系来识别包含歧视的数据库属性(例如，性别、种族或宗教);然后，基于这些依赖关系，它确定必须添加和/或删除的最小数据量，以减轻这种偏差。我们通过理论考虑和两个现实世界数据集的实验来评估我们的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Journal of Data and Information Quality COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

4.10

自引率

4.80%

发文量