Features Selection for Credit Risk Prediction Problem

IF 8.3 3区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Frontiers Pub Date : 2025-01-03 DOI:10.1007/s10796-024-10559-x

Ines Gasmi, Sana Neji, Salima Smiti, Makram Soui

{"title":"Features Selection for Credit Risk Prediction Problem","authors":"Ines Gasmi, Sana Neji, Salima Smiti, Makram Soui","doi":"10.1007/s10796-024-10559-x","DOIUrl":null,"url":null,"abstract":"<p>Credit risk assessment has drawn great interests from both researcher studies and financial institutions. In fact, classifying an applicant as defaulter or non-defaulter customer helps banks to make a reasonable decision. The classification of applicants is based on a set of historical information of past loans. Data sets for analysis may include different features, many of which may be irrelevant to the decision making process. Keeping irrelevant features or leaving out relevant ones may be harmful, causing generation of poor quality patterns that may lead to confusion decision. Determining an appropriate set of predictors is an important challenge in credit risk prediction research which guarantees better decision-making. It is the task of searching the smallest subset of features that provide the highest accuracy and comprehensibility. Thus, this study proposes feature selection-based classification model on credit risk assessment. To this end, five algorithms are applied, Speed-constrained Multi-objective PSO (SMPSO), Non-dominated Sorting Algorithm (NSGA-II), Sequential Forward Selection (SFS), Sequential Forward Floating Selection (SFFS), and Random Subset Feature Selection (RSFS). The selected subset is evaluated based on three classifiers K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Artificial Neural Network (ANN). Our proposed model is validated using three real-world credit datasets. The obtained results confirm the efficiency of SMPSO-KNN model to select the most significant features and provide the highest classification accuracy compared to existing models.</p>","PeriodicalId":13610,"journal":{"name":"Information Systems Frontiers","volume":"28 1","pages":""},"PeriodicalIF":8.3000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems Frontiers","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10796-024-10559-x","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Credit risk assessment has drawn great interests from both researcher studies and financial institutions. In fact, classifying an applicant as defaulter or non-defaulter customer helps banks to make a reasonable decision. The classification of applicants is based on a set of historical information of past loans. Data sets for analysis may include different features, many of which may be irrelevant to the decision making process. Keeping irrelevant features or leaving out relevant ones may be harmful, causing generation of poor quality patterns that may lead to confusion decision. Determining an appropriate set of predictors is an important challenge in credit risk prediction research which guarantees better decision-making. It is the task of searching the smallest subset of features that provide the highest accuracy and comprehensibility. Thus, this study proposes feature selection-based classification model on credit risk assessment. To this end, five algorithms are applied, Speed-constrained Multi-objective PSO (SMPSO), Non-dominated Sorting Algorithm (NSGA-II), Sequential Forward Selection (SFS), Sequential Forward Floating Selection (SFFS), and Random Subset Feature Selection (RSFS). The selected subset is evaluated based on three classifiers K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Artificial Neural Network (ANN). Our proposed model is validated using three real-world credit datasets. The obtained results confirm the efficiency of SMPSO-KNN model to select the most significant features and provide the highest classification accuracy compared to existing models.

查看原文本刊更多论文

信用风险预测问题的特征选择

信用风险评估已经引起了研究者和金融机构的极大兴趣。事实上，将申请人划分为违约客户和非违约客户有助于银行做出合理的决定。申请人的分类是基于过去贷款的一组历史信息。用于分析的数据集可能包括不同的特征，其中许多特征可能与决策过程无关。保留不相关的特性或省略相关的特性可能是有害的，会导致生成质量差的模式，从而导致决策混乱。确定一组合适的预测因子是信用风险预测研究的重要挑战，它保证了更好的决策。它是搜索提供最高准确性和可理解性的最小特征子集的任务。因此，本研究提出了基于特征选择的信用风险评估分类模型。为此，采用了速度约束多目标粒子群算法（SMPSO）、非支配排序算法（NSGA-II）、顺序前向选择（SFS）、顺序前向浮动选择（SFFS）和随机子集特征选择（RSFS）五种算法。选择的子集基于三个分类器k -最近邻（KNN），支持向量机（SVM）和人工神经网络（ANN）进行评估。我们提出的模型使用三个真实世界的信用数据集进行验证。得到的结果证实了SMPSO-KNN模型在选择最显著特征和提供最高分类精度方面的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems Frontiers 工程技术-计算机：理论方法

CiteScore

13.30

自引率

18.60%

发文量

127

审稿时长

9 months

期刊介绍： The interdisciplinary interfaces of Information Systems (IS) are fast emerging as defining areas of research and development in IS. These developments are largely due to the transformation of Information Technology (IT) towards networked worlds and its effects on global communications and economies. While these developments are shaping the way information is used in all forms of human enterprise, they are also setting the tone and pace of information systems of the future. The major advances in IT such as client/server systems, the Internet and the desktop/multimedia computing revolution, for example, have led to numerous important vistas of research and development with considerable practical impact and academic significance. While the industry seeks to develop high performance IS/IT solutions to a variety of contemporary information support needs, academia looks to extend the reach of IS technology into new application domains. Information Systems Frontiers (ISF) aims to provide a common forum of dissemination of frontline industrial developments of substantial academic value and pioneering academic research of significant practical impact.