An Explainable ADASYN-Based Focal Loss Approach for Credit Assessment

IF 2.7 3区经济学 Q1 ECONOMICS

Journal of Forecasting Pub Date : 2025-01-07 DOI:10.1002/for.3252

Shaukat Ali Shahee, Rujavi Patel

{"title":"An Explainable ADASYN-Based Focal Loss Approach for Credit Assessment","authors":"Shaukat Ali Shahee, Rujavi Patel","doi":"10.1002/for.3252","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The integration of deep learning techniques with financial technology (fintech) has revolutionized the credit risk analysis, a critical component of financial risk management. A pervasive challenge in credit risk assessment lies in the skewed distribution of data, hindering accurate predictions, particularly for minority class instances. In available literature, various solutions have been proposed to address class imbalance, albeit with limitations. Focal loss is one of the well-known loss functions proposed for handling class imbalance by running the hyperparameter \n<span></span><math>\n <mi>γ</mi></math>. However, imbalance still remains in terms of number of hard-to-learn observations between the classes. In this paper, we have proposed integration of ADASYN with focal loss to mitigate class imbalance and enhance credit scoring accuracy. ADASYN systematically generates synthetic data based on hard-to-learn examples to counter skewed distributions, while focal loss prioritizes the training of challenging examples, fostering a more balanced model performance. This approach has been rigorously tested using real-world imbalanced datasets and credit assessment data, and the outcomes have been compared against a range of sample technique and loss function combinations. The results clearly show that our suggested strategy is better than other approaches. Although improving the accuracy of credit risk analysis is critical, model interpretability is just as important for enabling financial analysts to make wise choices. In order to solve this, we have measured the global and local contributions of each feature using SHAP (Shapley additive explanation). According to global interpretability, the top 4 parameters influencing credit risk assessment are checking account status, loan purpose, borrower age, credit history, and interest rate/installment rate. Moreover, local interpretability analysis reveals quantitative and direction differences in feature contributions. These revelations not only broaden our knowledge of credit assessment services but also highlight how important a role they could play in attracting new clients and generating income. This paper also highlights how the suggested approach may be scaled to other imbalanced real-world datasets, demonstrating how it can improve model performance in terms of AUC, G-mean, and F-measure.</p>\n </div>","PeriodicalId":47835,"journal":{"name":"Journal of Forecasting","volume":"44 4","pages":"1513-1530"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Forecasting","FirstCategoryId":"96","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/for.3252","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

Abstract

The integration of deep learning techniques with financial technology (fintech) has revolutionized the credit risk analysis, a critical component of financial risk management. A pervasive challenge in credit risk assessment lies in the skewed distribution of data, hindering accurate predictions, particularly for minority class instances. In available literature, various solutions have been proposed to address class imbalance, albeit with limitations. Focal loss is one of the well-known loss functions proposed for handling class imbalance by running the hyperparameter $γ$ . However, imbalance still remains in terms of number of hard-to-learn observations between the classes. In this paper, we have proposed integration of ADASYN with focal loss to mitigate class imbalance and enhance credit scoring accuracy. ADASYN systematically generates synthetic data based on hard-to-learn examples to counter skewed distributions, while focal loss prioritizes the training of challenging examples, fostering a more balanced model performance. This approach has been rigorously tested using real-world imbalanced datasets and credit assessment data, and the outcomes have been compared against a range of sample technique and loss function combinations. The results clearly show that our suggested strategy is better than other approaches. Although improving the accuracy of credit risk analysis is critical, model interpretability is just as important for enabling financial analysts to make wise choices. In order to solve this, we have measured the global and local contributions of each feature using SHAP (Shapley additive explanation). According to global interpretability, the top 4 parameters influencing credit risk assessment are checking account status, loan purpose, borrower age, credit history, and interest rate/installment rate. Moreover, local interpretability analysis reveals quantitative and direction differences in feature contributions. These revelations not only broaden our knowledge of credit assessment services but also highlight how important a role they could play in attracting new clients and generating income. This paper also highlights how the suggested approach may be scaled to other imbalanced real-world datasets, demonstrating how it can improve model performance in terms of AUC, G-mean, and F-measure.

查看原文本刊更多论文

一种可解释的基于adasync的信用评估焦点损失方法

深度学习技术与金融技术（fintech）的整合彻底改变了信用风险分析，这是金融风险管理的关键组成部分。信用风险评估中一个普遍存在的挑战在于数据分布不均，阻碍了准确预测，尤其是对少数族裔的情况。在现有文献中，已经提出了各种解决方案来解决阶级不平衡，尽管有局限性。焦点损失是通过运行超参数γ来处理类不平衡的著名损失函数之一。然而，就班级之间难以学习的观察结果的数量而言，不平衡仍然存在。在本文中，我们提出了ADASYN与焦点损失的集成，以减轻班级失衡，提高信用评分的准确性。ADASYN系统地生成基于难学示例的合成数据，以对抗偏斜分布，而焦点损失优先训练具有挑战性的示例，从而培养更平衡的模型性能。这种方法已经使用现实世界的不平衡数据集和信用评估数据进行了严格的测试，并将结果与一系列样本技术和损失函数组合进行了比较。结果清楚地表明，我们提出的策略优于其他方法。虽然提高信用风险分析的准确性是至关重要的，但模型的可解释性对于使金融分析师做出明智的选择同样重要。为了解决这个问题，我们使用SHAP （Shapley加性解释）测量了每个特征的全局和局部贡献。根据全球可解释性，影响信用风险评估的前4个参数是支票账户状态、贷款用途、借款人年龄、信用历史和利率/分期付款利率。此外，局部可解释性分析揭示了特征贡献在数量和方向上的差异。这些披露不仅拓宽了我们对信用评估服务的认识，也突显出它们在吸引新客户和创造收入方面可以发挥多么重要的作用。本文还强调了所建议的方法如何扩展到其他不平衡的现实世界数据集，展示了它如何在AUC， G-mean和F-measure方面提高模型性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Forecasting Multiple-

CiteScore

5.40

自引率

5.90%

发文量

期刊介绍： The Journal of Forecasting is an international journal that publishes refereed papers on forecasting. It is multidisciplinary, welcoming papers dealing with any aspect of forecasting: theoretical, practical, computational and methodological. A broad interpretation of the topic is taken with approaches from various subject areas, such as statistics, economics, psychology, systems engineering and social sciences, all encouraged. Furthermore, the Journal welcomes a wide diversity of applications in such fields as business, government, technology and the environment. Of particular interest are papers dealing with modelling issues and the relationship of forecasting systems to decision-making processes.