信用风险预测问题的特征选择

IF 6.9 3区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Ines Gasmi, Sana Neji, Salima Smiti, Makram Soui
{"title":"信用风险预测问题的特征选择","authors":"Ines Gasmi, Sana Neji, Salima Smiti, Makram Soui","doi":"10.1007/s10796-024-10559-x","DOIUrl":null,"url":null,"abstract":"<p>Credit risk assessment has drawn great interests from both researcher studies and financial institutions. In fact, classifying an applicant as defaulter or non-defaulter customer helps banks to make a reasonable decision. The classification of applicants is based on a set of historical information of past loans. Data sets for analysis may include different features, many of which may be irrelevant to the decision making process. Keeping irrelevant features or leaving out relevant ones may be harmful, causing generation of poor quality patterns that may lead to confusion decision. Determining an appropriate set of predictors is an important challenge in credit risk prediction research which guarantees better decision-making. It is the task of searching the smallest subset of features that provide the highest accuracy and comprehensibility. Thus, this study proposes feature selection-based classification model on credit risk assessment. To this end, five algorithms are applied, Speed-constrained Multi-objective PSO (SMPSO), Non-dominated Sorting Algorithm (NSGA-II), Sequential Forward Selection (SFS), Sequential Forward Floating Selection (SFFS), and Random Subset Feature Selection (RSFS). The selected subset is evaluated based on three classifiers K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Artificial Neural Network (ANN). Our proposed model is validated using three real-world credit datasets. The obtained results confirm the efficiency of SMPSO-KNN model to select the most significant features and provide the highest classification accuracy compared to existing models.</p>","PeriodicalId":13610,"journal":{"name":"Information Systems Frontiers","volume":"28 1","pages":""},"PeriodicalIF":6.9000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Features Selection for Credit Risk Prediction Problem\",\"authors\":\"Ines Gasmi, Sana Neji, Salima Smiti, Makram Soui\",\"doi\":\"10.1007/s10796-024-10559-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Credit risk assessment has drawn great interests from both researcher studies and financial institutions. In fact, classifying an applicant as defaulter or non-defaulter customer helps banks to make a reasonable decision. The classification of applicants is based on a set of historical information of past loans. Data sets for analysis may include different features, many of which may be irrelevant to the decision making process. Keeping irrelevant features or leaving out relevant ones may be harmful, causing generation of poor quality patterns that may lead to confusion decision. Determining an appropriate set of predictors is an important challenge in credit risk prediction research which guarantees better decision-making. It is the task of searching the smallest subset of features that provide the highest accuracy and comprehensibility. Thus, this study proposes feature selection-based classification model on credit risk assessment. To this end, five algorithms are applied, Speed-constrained Multi-objective PSO (SMPSO), Non-dominated Sorting Algorithm (NSGA-II), Sequential Forward Selection (SFS), Sequential Forward Floating Selection (SFFS), and Random Subset Feature Selection (RSFS). The selected subset is evaluated based on three classifiers K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Artificial Neural Network (ANN). Our proposed model is validated using three real-world credit datasets. The obtained results confirm the efficiency of SMPSO-KNN model to select the most significant features and provide the highest classification accuracy compared to existing models.</p>\",\"PeriodicalId\":13610,\"journal\":{\"name\":\"Information Systems Frontiers\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems Frontiers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10796-024-10559-x\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems Frontiers","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10796-024-10559-x","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

信用风险评估已经引起了研究者和金融机构的极大兴趣。事实上,将申请人划分为违约客户和非违约客户有助于银行做出合理的决定。申请人的分类是基于过去贷款的一组历史信息。用于分析的数据集可能包括不同的特征,其中许多特征可能与决策过程无关。保留不相关的特性或省略相关的特性可能是有害的,会导致生成质量差的模式,从而导致决策混乱。确定一组合适的预测因子是信用风险预测研究的重要挑战,它保证了更好的决策。它是搜索提供最高准确性和可理解性的最小特征子集的任务。因此,本研究提出了基于特征选择的信用风险评估分类模型。为此,采用了速度约束多目标粒子群算法(SMPSO)、非支配排序算法(NSGA-II)、顺序前向选择(SFS)、顺序前向浮动选择(SFFS)和随机子集特征选择(RSFS)五种算法。选择的子集基于三个分类器k -最近邻(KNN),支持向量机(SVM)和人工神经网络(ANN)进行评估。我们提出的模型使用三个真实世界的信用数据集进行验证。得到的结果证实了SMPSO-KNN模型在选择最显著特征和提供最高分类精度方面的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Features Selection for Credit Risk Prediction Problem

Credit risk assessment has drawn great interests from both researcher studies and financial institutions. In fact, classifying an applicant as defaulter or non-defaulter customer helps banks to make a reasonable decision. The classification of applicants is based on a set of historical information of past loans. Data sets for analysis may include different features, many of which may be irrelevant to the decision making process. Keeping irrelevant features or leaving out relevant ones may be harmful, causing generation of poor quality patterns that may lead to confusion decision. Determining an appropriate set of predictors is an important challenge in credit risk prediction research which guarantees better decision-making. It is the task of searching the smallest subset of features that provide the highest accuracy and comprehensibility. Thus, this study proposes feature selection-based classification model on credit risk assessment. To this end, five algorithms are applied, Speed-constrained Multi-objective PSO (SMPSO), Non-dominated Sorting Algorithm (NSGA-II), Sequential Forward Selection (SFS), Sequential Forward Floating Selection (SFFS), and Random Subset Feature Selection (RSFS). The selected subset is evaluated based on three classifiers K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Artificial Neural Network (ANN). Our proposed model is validated using three real-world credit datasets. The obtained results confirm the efficiency of SMPSO-KNN model to select the most significant features and provide the highest classification accuracy compared to existing models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Systems Frontiers
Information Systems Frontiers 工程技术-计算机:理论方法
CiteScore
13.30
自引率
18.60%
发文量
127
审稿时长
9 months
期刊介绍: The interdisciplinary interfaces of Information Systems (IS) are fast emerging as defining areas of research and development in IS. These developments are largely due to the transformation of Information Technology (IT) towards networked worlds and its effects on global communications and economies. While these developments are shaping the way information is used in all forms of human enterprise, they are also setting the tone and pace of information systems of the future. The major advances in IT such as client/server systems, the Internet and the desktop/multimedia computing revolution, for example, have led to numerous important vistas of research and development with considerable practical impact and academic significance. While the industry seeks to develop high performance IS/IT solutions to a variety of contemporary information support needs, academia looks to extend the reach of IS technology into new application domains. Information Systems Frontiers (ISF) aims to provide a common forum of dissemination of frontline industrial developments of substantial academic value and pioneering academic research of significant practical impact.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信