{"title":"Why Do Models that Predict Failure Fail?","authors":"Hua Kiefer, Tom Mayock","doi":"10.2139/ssrn.3616889","DOIUrl":null,"url":null,"abstract":"In the first portion of this paper, we utilize millions of loan-level servicing records for mortgages originated between 2004 and 2016 to study the performance of predictive models of mortgage default. We find that the logistic regression model -- the traditional workhorse for consumer credit modeling -- as well as machine learning methods can be very inaccurate when used to predict loan performance in out-of-time samples. Importantly, we find that this model failure was not unique to the early-2000s housing boom.<br><br>We use the Panel Study of Income Dynamics in the second part of our paper to provide evidence that this model failure can be attributed to intertemporal heterogeneity in the relationship between variables that are frequently used to predict mortgage performance and the realized post-origination path of variables that have been shown to trigger mortgage default. Our findings imply that model instability is a significant source of risk for lenders, such as financial technology firms (\"Fintechs\"), that rely heavily on predictive statistical models and machine learning algorithms for underwriting and account management.","PeriodicalId":251522,"journal":{"name":"Risk Management & Analysis in Financial Institutions eJournal","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Risk Management & Analysis in Financial Institutions eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3616889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In the first portion of this paper, we utilize millions of loan-level servicing records for mortgages originated between 2004 and 2016 to study the performance of predictive models of mortgage default. We find that the logistic regression model -- the traditional workhorse for consumer credit modeling -- as well as machine learning methods can be very inaccurate when used to predict loan performance in out-of-time samples. Importantly, we find that this model failure was not unique to the early-2000s housing boom.
We use the Panel Study of Income Dynamics in the second part of our paper to provide evidence that this model failure can be attributed to intertemporal heterogeneity in the relationship between variables that are frequently used to predict mortgage performance and the realized post-origination path of variables that have been shown to trigger mortgage default. Our findings imply that model instability is a significant source of risk for lenders, such as financial technology firms ("Fintechs"), that rely heavily on predictive statistical models and machine learning algorithms for underwriting and account management.