Pitfalls in machine learning interpretability: Manipulating partial dependence plots to hide discrimination

IF 2.2 2区经济学 Q2 ECONOMICS

Insurance Mathematics & Economics Pub Date : 2025-07-30 DOI:10.1016/j.insmatheco.2025.103135

Xi Xin , Giles Hooker , Fei Huang

{"title":"Pitfalls in machine learning interpretability: Manipulating partial dependence plots to hide discrimination","authors":"Xi Xin , Giles Hooker , Fei Huang","doi":"10.1016/j.insmatheco.2025.103135","DOIUrl":null,"url":null,"abstract":"<div><div>The adoption of artificial intelligence (AI) across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework modifies the original black box model to manipulate its predictions for instances in the extrapolation domain. As a result, it produces deceptive PD plots that can conceal discriminatory behaviors while preserving most of the original model's predictions. This framework can produce multiple fooled PD plots via a single model. By using real-world datasets including an auto insurance claims dataset and COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset, our results show that it is possible to intentionally hide the discriminatory behavior of a predictor and make the black-box model appear neutral through interpretation tools like PD plots while retaining almost all the predictions of the original black-box model. Managerial insights for regulators and practitioners are provided based on the findings.</div></div>","PeriodicalId":54974,"journal":{"name":"Insurance Mathematics & Economics","volume":"125 ","pages":"Article 103135"},"PeriodicalIF":2.2000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Insurance Mathematics & Economics","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167668725000824","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

Abstract

The adoption of artificial intelligence (AI) across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework modifies the original black box model to manipulate its predictions for instances in the extrapolation domain. As a result, it produces deceptive PD plots that can conceal discriminatory behaviors while preserving most of the original model's predictions. This framework can produce multiple fooled PD plots via a single model. By using real-world datasets including an auto insurance claims dataset and COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset, our results show that it is possible to intentionally hide the discriminatory behavior of a predictor and make the black-box model appear neutral through interpretation tools like PD plots while retaining almost all the predictions of the original black-box model. Managerial insights for regulators and practitioners are provided based on the findings.

查看原文本刊更多论文

机器学习可解释性的陷阱：操纵部分依赖图来隐藏歧视

人工智能（AI）在各行各业的应用导致了复杂的黑箱模型和解释工具在决策中的广泛使用。本文提出了一个对抗性框架来揭示基于排列的机器学习任务解释方法的脆弱性，特别关注部分依赖（PD）图。这个对抗性框架修改了原始的黑盒模型，以操纵它对外推域中实例的预测。结果，它产生了欺骗性的PD图，可以隐藏歧视行为，同时保留了大多数原始模型的预测。该框架可以通过一个模型生成多个被愚弄的PD图。通过使用现实世界的数据集，包括汽车保险索赔数据集和COMPAS（惩戒罪犯管理分析替代制裁）数据集，我们的研究结果表明，有可能故意隐藏预测者的歧视行为，并通过PD图等解释工具使黑箱模型看起来中立，同时保留原始黑箱模型的几乎所有预测。根据研究结果，为监管者和从业人员提供了管理见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Insurance Mathematics & Economics 管理科学-数学跨学科应用

CiteScore

3.40

自引率

15.80%

发文量

审稿时长

17.3 weeks

期刊介绍： Insurance: Mathematics and Economics publishes leading research spanning all fields of actuarial science research. It appears six times per year and is the largest journal in actuarial science research around the world. Insurance: Mathematics and Economics is an international academic journal that aims to strengthen the communication between individuals and groups who develop and apply research results in actuarial science. The journal feels a particular obligation to facilitate closer cooperation between those who conduct research in insurance mathematics and quantitative insurance economics, and practicing actuaries who are interested in the implementation of the results. To this purpose, Insurance: Mathematics and Economics publishes high-quality articles of broad international interest, concerned with either the theory of insurance mathematics and quantitative insurance economics or the inventive application of it, including empirical or experimental results. Articles that combine several of these aspects are particularly considered.