信用评分中非随机样本的套索和自适应套索

IF 1.2 4区数学 Q2 STATISTICS & PROBABILITY

Statistical Modelling Pub Date : 2022-05-09 DOI:10.1177/1471082x221092181

E. Ogundimu

{"title":"信用评分中非随机样本的套索和自适应套索","authors":"E. Ogundimu","doi":"10.1177/1471082x221092181","DOIUrl":null,"url":null,"abstract":"Prediction models in credit scoring are often formulated using available data on accepted applicants at the loan application stage. The use of this data to estimate probability of default (PD) may lead to bias due to non-random selection from the population of applicants. That is, the PD in the general population of applicants may not be the same with the PD in the subpopulation of the accepted applicants. A prominent model for the reduction of bias in this framework is the sample selection model, but there is no consensus on its utility yet. It is unclear if the bias-variance trade- off of regularization techniques can improve the predictions of PD in non-random sample selection setting. To address this, we propose the use of Lasso and adaptive Lasso for variable selection and optimal predictive accuracy. By appealing to the least square approximation of the likelihood function of sample selection model, we optimize the resulting function subject to L1 and adaptively weighted L1 penalties using an efficient algorithm. We evaluate the performance of the proposed approach and competing alternatives in a simulation study and applied it to the well-known American Express credit card dataset.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2022-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Lasso and adaptive Lasso for non-random sample in credit scoring\",\"authors\":\"E. Ogundimu\",\"doi\":\"10.1177/1471082x221092181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prediction models in credit scoring are often formulated using available data on accepted applicants at the loan application stage. The use of this data to estimate probability of default (PD) may lead to bias due to non-random selection from the population of applicants. That is, the PD in the general population of applicants may not be the same with the PD in the subpopulation of the accepted applicants. A prominent model for the reduction of bias in this framework is the sample selection model, but there is no consensus on its utility yet. It is unclear if the bias-variance trade- off of regularization techniques can improve the predictions of PD in non-random sample selection setting. To address this, we propose the use of Lasso and adaptive Lasso for variable selection and optimal predictive accuracy. By appealing to the least square approximation of the likelihood function of sample selection model, we optimize the resulting function subject to L1 and adaptively weighted L1 penalties using an efficient algorithm. We evaluate the performance of the proposed approach and competing alternatives in a simulation study and applied it to the well-known American Express credit card dataset.\",\"PeriodicalId\":49476,\"journal\":{\"name\":\"Statistical Modelling\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2022-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Modelling\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1177/1471082x221092181\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Modelling","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1177/1471082x221092181","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

信用评分中的预测模型通常是使用贷款申请阶段已接受申请人的可用数据制定的。由于从申请人群体中进行非随机选择，使用这些数据来估计违约概率（PD）可能会导致偏差。也就是说，一般申请人群体中的PD可能与已接受申请人亚群体中的PD。在这个框架中，减少偏见的一个突出模型是样本选择模型，但对其效用还没有达成共识。目前尚不清楚正则化技术的偏方差权衡是否可以在非随机样本选择环境中改善PD的预测。为了解决这一问题，我们建议使用Lasso和自适应Lasso进行变量选择和最佳预测精度。通过利用样本选择模型的似然函数的最小二乘近似，我们优化了受L1约束的结果函数，并使用有效的算法自适应地加权L1惩罚。我们在模拟研究中评估了所提出的方法和竞争替代方案的性能，并将其应用于著名的美国运通信用卡数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On Lasso and adaptive Lasso for non-random sample in credit scoring

Prediction models in credit scoring are often formulated using available data on accepted applicants at the loan application stage. The use of this data to estimate probability of default (PD) may lead to bias due to non-random selection from the population of applicants. That is, the PD in the general population of applicants may not be the same with the PD in the subpopulation of the accepted applicants. A prominent model for the reduction of bias in this framework is the sample selection model, but there is no consensus on its utility yet. It is unclear if the bias-variance trade- off of regularization techniques can improve the predictions of PD in non-random sample selection setting. To address this, we propose the use of Lasso and adaptive Lasso for variable selection and optimal predictive accuracy. By appealing to the least square approximation of the likelihood function of sample selection model, we optimize the resulting function subject to L1 and adaptively weighted L1 penalties using an efficient algorithm. We evaluate the performance of the proposed approach and competing alternatives in a simulation study and applied it to the well-known American Express credit card dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Modelling 数学-统计学与概率论

CiteScore

2.20

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： The primary aim of the journal is to publish original and high-quality articles that recognize statistical modelling as the general framework for the application of statistical ideas. Submissions must reflect important developments, extensions, and applications in statistical modelling. The journal also encourages submissions that describe scientifically interesting, complex or novel statistical modelling aspects from a wide diversity of disciplines, and submissions that embrace the diversity of applied statistical modelling.