Will they repay their debt? Identification of borrowers likely to be charged off

IF 1.9 Q3 BUSINESS
R. Caplescu, A. Panaite, D. Pele, V. Strat
{"title":"Will they repay their debt? Identification of borrowers likely to be charged off","authors":"R. Caplescu, A. Panaite, D. Pele, V. Strat","doi":"10.2478/mmcks-2020-0023","DOIUrl":null,"url":null,"abstract":"Abstract Recent increase in peer-to-peer lending prompted for development of models to separate good and bad clients to mitigate risks both for lenders and for the platforms. The rapidly increasing body of literature provides several comparisons between various models. Among the most frequently employed ones are logistic regression, Support Vector Machines, neural networks and decision tree-based models. Among them, logistic regression has proved to be a strong candidate both because its good performance and due to its high explainability. The present paper aims to compare four pairs of models (for imbalanced and under-sampled data) meant to predict charged off clients by optimizing F1 score. We found that, if the data is balanced, Logistic Regression, both simple and with Stochastic Gradient Descent, outperforms LightGBM and K-Nearest Neighbors in optimizing F1 score. We chose this metric as it provides balance between the interests of the lenders and those of the platform. Loan term, debt-to-income ratio and number of accounts were found to be important positively related predictors of risk of charge off. At the other end of the spectrum, by far the strongest impact on charge off probability is that of the FICO score. The final number of features retained by the two models differs very much, because, although both models use Lasso for feature selection, Stochastic Gradient Descent Logistic Regression uses a stronger regularization. The analysis was performed using Python (numpy, pandas, sklearn and imblearn).","PeriodicalId":44909,"journal":{"name":"Management & Marketing-Challenges for the Knowledge Society","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2020-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Management & Marketing-Challenges for the Knowledge Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/mmcks-2020-0023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BUSINESS","Score":null,"Total":0}
引用次数: 3

Abstract

Abstract Recent increase in peer-to-peer lending prompted for development of models to separate good and bad clients to mitigate risks both for lenders and for the platforms. The rapidly increasing body of literature provides several comparisons between various models. Among the most frequently employed ones are logistic regression, Support Vector Machines, neural networks and decision tree-based models. Among them, logistic regression has proved to be a strong candidate both because its good performance and due to its high explainability. The present paper aims to compare four pairs of models (for imbalanced and under-sampled data) meant to predict charged off clients by optimizing F1 score. We found that, if the data is balanced, Logistic Regression, both simple and with Stochastic Gradient Descent, outperforms LightGBM and K-Nearest Neighbors in optimizing F1 score. We chose this metric as it provides balance between the interests of the lenders and those of the platform. Loan term, debt-to-income ratio and number of accounts were found to be important positively related predictors of risk of charge off. At the other end of the spectrum, by far the strongest impact on charge off probability is that of the FICO score. The final number of features retained by the two models differs very much, because, although both models use Lasso for feature selection, Stochastic Gradient Descent Logistic Regression uses a stronger regularization. The analysis was performed using Python (numpy, pandas, sklearn and imblearn).
他们会偿还债务吗?确定可能被冲销的借款人
最近p2p借贷的增加促使了区分好客户和坏客户的模型的发展,以减轻贷方和平台的风险。迅速增加的文献提供了几种不同模型之间的比较。其中最常用的是逻辑回归,支持向量机,神经网络和基于决策树的模型。其中,逻辑回归因其良好的性能和较高的可解释性而被证明是一个强有力的候选者。本文旨在比较四对模型(用于不平衡和采样不足的数据),旨在通过优化F1分数来预测收费客户。我们发现,如果数据是平衡的,Logistic回归,无论是简单的还是随机梯度下降,在优化F1分数方面都优于LightGBM和k近邻。我们之所以选择这一指标,是因为它能够平衡出借人与平台之间的利益。贷款期限、债务收入比和账户数量是冲销风险的重要正相关预测因子。在频谱的另一端,到目前为止,对冲销概率影响最大的是FICO分数。两种模型保留的最终特征数量差别很大,因为尽管两种模型都使用Lasso进行特征选择,但随机梯度下降逻辑回归使用了更强的正则化。使用Python (numpy, pandas, sklearn和imblearn)进行分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.20
自引率
2.70%
发文量
25
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信