Stephanie Kelley, Anton Ovchinnikov, D. Hardoon, Adrienne Heinrich
{"title":"Antidiscrimination Laws, Artificial Intelligence, and Gender Bias: A Case Study in Nonmortgage Fintech Lending","authors":"Stephanie Kelley, Anton Ovchinnikov, D. Hardoon, Adrienne Heinrich","doi":"10.1287/msom.2022.1108","DOIUrl":null,"url":null,"abstract":"Problem definition: We use a realistically large, publicly available data set from a global fintech lender to simulate the impact of different antidiscrimination laws and their corresponding data management and model-building regimes on gender-based discrimination in the nonmortgage fintech lending setting. Academic/practical relevance: Our paper extends the conceptual understanding of model-based discrimination from computer science to a realistic context that simulates the situations faced by fintech lenders in practice, where advanced machine learning (ML) techniques are used with high-dimensional, feature-rich, highly multicollinear data. We provide technically and legally permissible approaches for firms to reduce discrimination across different antidiscrimination regimes whilst managing profitability. Methodology: We train statistical and ML models on a large and realistically rich publicly available data set to simulate different antidiscrimination regimes and measure their impact on model quality and firm profitability. We use ML explainability techniques to understand the drivers of ML discrimination. Results: We find that regimes that prohibit the use of gender (like those in the United States) substantially increase discrimination and slightly decrease firm profitability. We observe that ML models are less discriminatory, of better predictive quality, and more profitable compared with traditional statistical models like logistic regression. Unlike omitted variable bias—which drives discrimination in statistical models—ML discrimination is driven by changes in the model training procedure, including feature engineering and feature selection, when gender is excluded. We observe that down sampling the training data to rebalance gender, gender-aware hyperparameter selection, and up sampling the training data to rebalance gender all reduce discrimination, with varying trade-offs in predictive quality and firm profitability. Probabilistic gender proxy modeling (imputing applicant gender) further reduces discrimination with negligible impact on predictive quality and a slight increase in firm profitability. Managerial implications: A rethink is required of the antidiscrimination laws, specifically with respect to the collection and use of protected attributes for ML models. Firms should be able to collect protected attributes to, at minimum, measure discrimination and ideally, take steps to reduce it. Increased data access should come with greater accountability for firms.","PeriodicalId":18108,"journal":{"name":"Manuf. Serv. Oper. Manag.","volume":"54 3 1","pages":"3039-3059"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Manuf. Serv. Oper. Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/msom.2022.1108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Problem definition: We use a realistically large, publicly available data set from a global fintech lender to simulate the impact of different antidiscrimination laws and their corresponding data management and model-building regimes on gender-based discrimination in the nonmortgage fintech lending setting. Academic/practical relevance: Our paper extends the conceptual understanding of model-based discrimination from computer science to a realistic context that simulates the situations faced by fintech lenders in practice, where advanced machine learning (ML) techniques are used with high-dimensional, feature-rich, highly multicollinear data. We provide technically and legally permissible approaches for firms to reduce discrimination across different antidiscrimination regimes whilst managing profitability. Methodology: We train statistical and ML models on a large and realistically rich publicly available data set to simulate different antidiscrimination regimes and measure their impact on model quality and firm profitability. We use ML explainability techniques to understand the drivers of ML discrimination. Results: We find that regimes that prohibit the use of gender (like those in the United States) substantially increase discrimination and slightly decrease firm profitability. We observe that ML models are less discriminatory, of better predictive quality, and more profitable compared with traditional statistical models like logistic regression. Unlike omitted variable bias—which drives discrimination in statistical models—ML discrimination is driven by changes in the model training procedure, including feature engineering and feature selection, when gender is excluded. We observe that down sampling the training data to rebalance gender, gender-aware hyperparameter selection, and up sampling the training data to rebalance gender all reduce discrimination, with varying trade-offs in predictive quality and firm profitability. Probabilistic gender proxy modeling (imputing applicant gender) further reduces discrimination with negligible impact on predictive quality and a slight increase in firm profitability. Managerial implications: A rethink is required of the antidiscrimination laws, specifically with respect to the collection and use of protected attributes for ML models. Firms should be able to collect protected attributes to, at minimum, measure discrimination and ideally, take steps to reduce it. Increased data access should come with greater accountability for firms.