Detecting and Processing Unsuspected Sensitive Variables for Robust Machine Learning

IF 2.1 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Algorithms Pub Date : 2023-11-07 DOI:10.3390/a16110510

Laurent Risser, Agustin Martin Picard, Lucas Hervier, Jean-Michel Loubes

{"title":"Detecting and Processing Unsuspected Sensitive Variables for Robust Machine Learning","authors":"Laurent Risser, Agustin Martin Picard, Lucas Hervier, Jean-Michel Loubes","doi":"10.3390/a16110510","DOIUrl":null,"url":null,"abstract":"The problem of algorithmic bias in machine learning has recently gained a lot of attention due to its potentially strong impact on our societies. In much the same manner, algorithmic biases can alter industrial and safety-critical machine learning applications, where high-dimensional inputs are used. This issue has, however, been mostly left out of the spotlight in the machine learning literature. Contrary to societal applications, where a set of potentially sensitive variables, such as gender or race, can be defined by common sense or by regulations to draw attention to potential risks, the sensitive variables are often unsuspected in industrial and safety-critical applications. In addition, these unsuspected sensitive variables may be indirectly represented as a latent feature of the input data. For instance, the predictions of an image classifier may be altered by reconstruction artefacts in a small subset of the training images. This raises serious and well-founded concerns about the commercial deployment of AI-based solutions, especially in a context where new regulations address bias issues in AI. The purpose of our paper is, then, to first give a large overview of recent advances in robust machine learning. Then, we propose a new procedure to detect and to treat such unknown biases. As far as we know, no equivalent procedure has been proposed in the literature so far. The procedure is also generic enough to be used in a wide variety of industrial contexts. Its relevance is demonstrated on a set of satellite images used to train a classifier. In this illustration, our technique detects that a subset of the training images has reconstruction faults, leading to systematic prediction errors that would have been unsuspected using conventional cross-validation techniques.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"45 38","pages":"0"},"PeriodicalIF":2.1000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a16110510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The problem of algorithmic bias in machine learning has recently gained a lot of attention due to its potentially strong impact on our societies. In much the same manner, algorithmic biases can alter industrial and safety-critical machine learning applications, where high-dimensional inputs are used. This issue has, however, been mostly left out of the spotlight in the machine learning literature. Contrary to societal applications, where a set of potentially sensitive variables, such as gender or race, can be defined by common sense or by regulations to draw attention to potential risks, the sensitive variables are often unsuspected in industrial and safety-critical applications. In addition, these unsuspected sensitive variables may be indirectly represented as a latent feature of the input data. For instance, the predictions of an image classifier may be altered by reconstruction artefacts in a small subset of the training images. This raises serious and well-founded concerns about the commercial deployment of AI-based solutions, especially in a context where new regulations address bias issues in AI. The purpose of our paper is, then, to first give a large overview of recent advances in robust machine learning. Then, we propose a new procedure to detect and to treat such unknown biases. As far as we know, no equivalent procedure has been proposed in the literature so far. The procedure is also generic enough to be used in a wide variety of industrial contexts. Its relevance is demonstrated on a set of satellite images used to train a classifier. In this illustration, our technique detects that a subset of the training images has reconstruction faults, leading to systematic prediction errors that would have been unsuspected using conventional cross-validation techniques.

查看原文本刊更多论文

鲁棒机器学习中未知敏感变量的检测与处理

机器学习中的算法偏差问题最近引起了很多关注，因为它可能对我们的社会产生巨大影响。以同样的方式，算法偏差可以改变工业和安全关键型机器学习应用，在这些应用中使用高维输入。然而，这个问题在机器学习文献中却很少被关注。与社会应用相反，在社会应用中，一组潜在的敏感变量，如性别或种族，可以通过常识或法规来定义，以引起对潜在风险的注意，而在工业和安全关键应用中，敏感变量通常是不被怀疑的。此外，这些未预料到的敏感变量可以间接地表示为输入数据的潜在特征。例如，图像分类器的预测可能会被一小部分训练图像中的重建伪影所改变。这引发了对基于人工智能的解决方案的商业部署的严重和有根据的担忧，特别是在新法规解决人工智能中的偏见问题的背景下。因此，本文的目的是首先对鲁棒机器学习的最新进展进行概述。然后，我们提出了一种新的方法来检测和处理这些未知的偏差。据我们所知，到目前为止，文献中还没有提出相应的程序。该程序也足够通用，可以在各种工业环境中使用。其相关性在一组用于训练分类器的卫星图像上得到了证明。在这个例子中，我们的技术检测到训练图像的一个子集有重建错误，导致使用传统交叉验证技术无法预料的系统预测错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊