Detecting and Processing Unsuspected Sensitive Variables for Robust Machine Learning

IF 1.8 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Algorithms Pub Date : 2023-11-07 DOI:10.3390/a16110510
Laurent Risser, Agustin Martin Picard, Lucas Hervier, Jean-Michel Loubes
{"title":"Detecting and Processing Unsuspected Sensitive Variables for Robust Machine Learning","authors":"Laurent Risser, Agustin Martin Picard, Lucas Hervier, Jean-Michel Loubes","doi":"10.3390/a16110510","DOIUrl":null,"url":null,"abstract":"The problem of algorithmic bias in machine learning has recently gained a lot of attention due to its potentially strong impact on our societies. In much the same manner, algorithmic biases can alter industrial and safety-critical machine learning applications, where high-dimensional inputs are used. This issue has, however, been mostly left out of the spotlight in the machine learning literature. Contrary to societal applications, where a set of potentially sensitive variables, such as gender or race, can be defined by common sense or by regulations to draw attention to potential risks, the sensitive variables are often unsuspected in industrial and safety-critical applications. In addition, these unsuspected sensitive variables may be indirectly represented as a latent feature of the input data. For instance, the predictions of an image classifier may be altered by reconstruction artefacts in a small subset of the training images. This raises serious and well-founded concerns about the commercial deployment of AI-based solutions, especially in a context where new regulations address bias issues in AI. The purpose of our paper is, then, to first give a large overview of recent advances in robust machine learning. Then, we propose a new procedure to detect and to treat such unknown biases. As far as we know, no equivalent procedure has been proposed in the literature so far. The procedure is also generic enough to be used in a wide variety of industrial contexts. Its relevance is demonstrated on a set of satellite images used to train a classifier. In this illustration, our technique detects that a subset of the training images has reconstruction faults, leading to systematic prediction errors that would have been unsuspected using conventional cross-validation techniques.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"45 38","pages":"0"},"PeriodicalIF":1.8000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a16110510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The problem of algorithmic bias in machine learning has recently gained a lot of attention due to its potentially strong impact on our societies. In much the same manner, algorithmic biases can alter industrial and safety-critical machine learning applications, where high-dimensional inputs are used. This issue has, however, been mostly left out of the spotlight in the machine learning literature. Contrary to societal applications, where a set of potentially sensitive variables, such as gender or race, can be defined by common sense or by regulations to draw attention to potential risks, the sensitive variables are often unsuspected in industrial and safety-critical applications. In addition, these unsuspected sensitive variables may be indirectly represented as a latent feature of the input data. For instance, the predictions of an image classifier may be altered by reconstruction artefacts in a small subset of the training images. This raises serious and well-founded concerns about the commercial deployment of AI-based solutions, especially in a context where new regulations address bias issues in AI. The purpose of our paper is, then, to first give a large overview of recent advances in robust machine learning. Then, we propose a new procedure to detect and to treat such unknown biases. As far as we know, no equivalent procedure has been proposed in the literature so far. The procedure is also generic enough to be used in a wide variety of industrial contexts. Its relevance is demonstrated on a set of satellite images used to train a classifier. In this illustration, our technique detects that a subset of the training images has reconstruction faults, leading to systematic prediction errors that would have been unsuspected using conventional cross-validation techniques.
鲁棒机器学习中未知敏感变量的检测与处理
机器学习中的算法偏差问题最近引起了很多关注,因为它可能对我们的社会产生巨大影响。以同样的方式,算法偏差可以改变工业和安全关键型机器学习应用,在这些应用中使用高维输入。然而,这个问题在机器学习文献中却很少被关注。与社会应用相反,在社会应用中,一组潜在的敏感变量,如性别或种族,可以通过常识或法规来定义,以引起对潜在风险的注意,而在工业和安全关键应用中,敏感变量通常是不被怀疑的。此外,这些未预料到的敏感变量可以间接地表示为输入数据的潜在特征。例如,图像分类器的预测可能会被一小部分训练图像中的重建伪影所改变。这引发了对基于人工智能的解决方案的商业部署的严重和有根据的担忧,特别是在新法规解决人工智能中的偏见问题的背景下。因此,本文的目的是首先对鲁棒机器学习的最新进展进行概述。然后,我们提出了一种新的方法来检测和处理这些未知的偏差。据我们所知,到目前为止,文献中还没有提出相应的程序。该程序也足够通用,可以在各种工业环境中使用。其相关性在一组用于训练分类器的卫星图像上得到了证明。在这个例子中,我们的技术检测到训练图像的一个子集有重建错误,导致使用传统交叉验证技术无法预料的系统预测错误。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Algorithms
Algorithms Mathematics-Numerical Analysis
CiteScore
4.10
自引率
4.30%
发文量
394
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信