Golnoosh Babaei , Paolo Giudici , Parvati Neelakantan
{"title":"信贷中的可解释性、公平性和辛普森悖论","authors":"Golnoosh Babaei , Paolo Giudici , Parvati Neelakantan","doi":"10.1016/j.physa.2025.131030","DOIUrl":null,"url":null,"abstract":"<div><div>Fairness is a key requirement for artificial intelligence applications. The assessment of fairness is typically based on group based measures, such as statistical parity, which compares the machine learning output for different protected population groups, such as male and females. Although intuitive and simple, statistical parity may be affected by the presence of explanatory variables correlated with the protected variable. To remove this effect, we propose to replace statistical parity with Shapley values, which measures the difference in output specifically due to the protected variable. This allows to check for the presence of Simpson’s paradox, for which a fair model may become unfair when conditioning on the explanatory variables. We apply our proposal to a real-world database that concerns credit lending in the state of New York, containing 157,269 personal lending decisions. The empirical findings show that both logistic regression and random forest models are fair, when all loan applications are considered; but become unfair, when the requested loan amount is high.</div></div>","PeriodicalId":20152,"journal":{"name":"Physica A: Statistical Mechanics and its Applications","volume":"680 ","pages":"Article 131030"},"PeriodicalIF":3.1000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explainability, fairness and the Simpson’s paradox in credit lending\",\"authors\":\"Golnoosh Babaei , Paolo Giudici , Parvati Neelakantan\",\"doi\":\"10.1016/j.physa.2025.131030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Fairness is a key requirement for artificial intelligence applications. The assessment of fairness is typically based on group based measures, such as statistical parity, which compares the machine learning output for different protected population groups, such as male and females. Although intuitive and simple, statistical parity may be affected by the presence of explanatory variables correlated with the protected variable. To remove this effect, we propose to replace statistical parity with Shapley values, which measures the difference in output specifically due to the protected variable. This allows to check for the presence of Simpson’s paradox, for which a fair model may become unfair when conditioning on the explanatory variables. We apply our proposal to a real-world database that concerns credit lending in the state of New York, containing 157,269 personal lending decisions. The empirical findings show that both logistic regression and random forest models are fair, when all loan applications are considered; but become unfair, when the requested loan amount is high.</div></div>\",\"PeriodicalId\":20152,\"journal\":{\"name\":\"Physica A: Statistical Mechanics and its Applications\",\"volume\":\"680 \",\"pages\":\"Article 131030\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physica A: Statistical Mechanics and its Applications\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S037843712500682X\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physica A: Statistical Mechanics and its Applications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S037843712500682X","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
Explainability, fairness and the Simpson’s paradox in credit lending
Fairness is a key requirement for artificial intelligence applications. The assessment of fairness is typically based on group based measures, such as statistical parity, which compares the machine learning output for different protected population groups, such as male and females. Although intuitive and simple, statistical parity may be affected by the presence of explanatory variables correlated with the protected variable. To remove this effect, we propose to replace statistical parity with Shapley values, which measures the difference in output specifically due to the protected variable. This allows to check for the presence of Simpson’s paradox, for which a fair model may become unfair when conditioning on the explanatory variables. We apply our proposal to a real-world database that concerns credit lending in the state of New York, containing 157,269 personal lending decisions. The empirical findings show that both logistic regression and random forest models are fair, when all loan applications are considered; but become unfair, when the requested loan amount is high.
期刊介绍:
Physica A: Statistical Mechanics and its Applications
Recognized by the European Physical Society
Physica A publishes research in the field of statistical mechanics and its applications.
Statistical mechanics sets out to explain the behaviour of macroscopic systems by studying the statistical properties of their microscopic constituents.
Applications of the techniques of statistical mechanics are widespread, and include: applications to physical systems such as solids, liquids and gases; applications to chemical and biological systems (colloids, interfaces, complex fluids, polymers and biopolymers, cell physics); and other interdisciplinary applications to for instance biological, economical and sociological systems.