Machine Learning for Credit Risk Prediction: A Systematic Literature Review

IF 2.2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Data Pub Date : 2023-11-07 DOI:10.3390/data8110169
Jomark Pablo Noriega, Luis Antonio Rivera, José Alfredo Herrera
{"title":"Machine Learning for Credit Risk Prediction: A Systematic Literature Review","authors":"Jomark Pablo Noriega, Luis Antonio Rivera, José Alfredo Herrera","doi":"10.3390/data8110169","DOIUrl":null,"url":null,"abstract":"In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite.","PeriodicalId":36824,"journal":{"name":"Data","volume":"5 6","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/data8110169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In this systematic review of the literature on using Machine Learning (ML) for credit risk prediction, we raise the need for financial institutions to use Artificial Intelligence (AI) and ML to assess credit risk, analyzing large volumes of information. We posed research questions about algorithms, metrics, results, datasets, variables, and related limitations in predicting credit risk. In addition, we searched renowned databases responding to them and identified 52 relevant studies within the credit industry of microfinance. Challenges and approaches in credit risk prediction using ML models were identified; we had difficulties with the implemented models such as the black box model, the need for explanatory artificial intelligence, the importance of selecting relevant features, addressing multicollinearity, and the problem of the imbalance in the input data. By answering the inquiries, we identified that the Boosted Category is the most researched family of ML models; the most commonly used metrics for evaluation are Area Under Curve (AUC), Accuracy (ACC), Recall, precision measure F1 (F1), and Precision. Research mainly uses public datasets to compare models, and private ones to generate new knowledge when applied to the real world. The most significant limitation identified is the representativeness of reality, and the variables primarily used in the microcredit industry are data related to the Demographic, Operation, and Payment behavior. This study aims to guide developers of credit risk management tools and software towards the existing ability of ML methods, metrics, and techniques used to forecast it, thereby minimizing possible losses due to default and guiding risk appetite.
信用风险预测的机器学习:系统文献综述
在这篇关于使用机器学习(ML)进行信用风险预测的文献的系统综述中,我们提出金融机构需要使用人工智能(AI)和机器学习来评估信用风险,分析大量信息。我们提出了关于预测信用风险的算法、指标、结果、数据集、变量和相关限制的研究问题。此外,我们检索了与他们相关的知名数据库,并在小额信贷信贷行业中确定了52项相关研究。识别了使用ML模型进行信用风险预测的挑战和方法;我们在实现模型方面遇到了困难,例如黑箱模型、解释性人工智能的需求、选择相关特征的重要性、解决多重共线性以及输入数据不平衡的问题。通过回答这些问题,我们发现boost类别是研究最多的ML模型家族;最常用的评估指标是曲线下面积(AUC)、准确度(ACC)、召回率(Recall)、精度测量F1 (F1)和精度(precision)。研究主要使用公共数据集来比较模型,而使用私有数据集在应用于现实世界时产生新的知识。发现的最重要的限制是现实的代表性,小额信贷行业主要使用的变量是与人口统计、操作和支付行为相关的数据。本研究旨在指导信用风险管理工具和软件的开发人员利用机器学习方法、指标和技术的现有能力来预测信用风险,从而最大限度地减少违约可能造成的损失,并引导风险偏好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data
Data Decision Sciences-Information Systems and Management
CiteScore
4.30
自引率
3.80%
发文量
0
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信