银行贷款审批预测中的集合学习和特征选择方法性能分析

Iqbal Muhammad, Rizka Dahlia, Muhammad Ifan Rifani Ihsan, Lisnawanty, Rabiatus Sa’adah
{"title":"银行贷款审批预测中的集合学习和特征选择方法性能分析","authors":"Iqbal Muhammad, Rizka Dahlia, Muhammad Ifan Rifani Ihsan, Lisnawanty, Rabiatus Sa’adah","doi":"10.59934/jaiea.v3i2.426","DOIUrl":null,"url":null,"abstract":"Applying for a loan at a bank has a series of relevant assessments based on data and credit scores in determining a borrower's eligibility to receive a loan from the bank. Machine learning is the basis for evaluating whether an individual is worthy of obtaining a loan, in order to reduce the potential risks faced by banks. This research aims to obtain the best accuracy value from the Loan Approval Prediction dataset which is sourced from the open dataset provider website, namely Kaggle. This Loan Approval Prediction dataset has 14 features with 4,269 data. The results of dataset analysis carried out on 4,269 data showed that the amount of data that could be studied was 4,173 data (2,599 data were approved and 1,574 data were rejected). The results of the feature importance evaluation on 14 features show that loan amount is the most important feature compared to other features, while bank asset value is the feature that has the lowest influence. Research on the Loan Approval Prediction dataset was also carried out by testing several Decision Tree ensemble models, including Extreme Gradient Boosting or XGBoost, Light Gradient Boosting Machine (Light GBM), Gradient Boosting, Random Forest, Adaptive Boosting (Adaboost) and Extra Trees. The comparison results show that the XGBoost (Extreme Gradient Boosting) model is the best model, with Accuracy 0.9974, AUC 0.9998, Recall 0.9963, Prec 0.9969, F1 0.9966.","PeriodicalId":320979,"journal":{"name":"Journal of Artificial Intelligence and Engineering Applications (JAIEA)","volume":"129 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Analysis of Ensemble Learning and Feature Selection Methods in Loan Approval Prediction at Banks\",\"authors\":\"Iqbal Muhammad, Rizka Dahlia, Muhammad Ifan Rifani Ihsan, Lisnawanty, Rabiatus Sa’adah\",\"doi\":\"10.59934/jaiea.v3i2.426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applying for a loan at a bank has a series of relevant assessments based on data and credit scores in determining a borrower's eligibility to receive a loan from the bank. Machine learning is the basis for evaluating whether an individual is worthy of obtaining a loan, in order to reduce the potential risks faced by banks. This research aims to obtain the best accuracy value from the Loan Approval Prediction dataset which is sourced from the open dataset provider website, namely Kaggle. This Loan Approval Prediction dataset has 14 features with 4,269 data. The results of dataset analysis carried out on 4,269 data showed that the amount of data that could be studied was 4,173 data (2,599 data were approved and 1,574 data were rejected). The results of the feature importance evaluation on 14 features show that loan amount is the most important feature compared to other features, while bank asset value is the feature that has the lowest influence. Research on the Loan Approval Prediction dataset was also carried out by testing several Decision Tree ensemble models, including Extreme Gradient Boosting or XGBoost, Light Gradient Boosting Machine (Light GBM), Gradient Boosting, Random Forest, Adaptive Boosting (Adaboost) and Extra Trees. The comparison results show that the XGBoost (Extreme Gradient Boosting) model is the best model, with Accuracy 0.9974, AUC 0.9998, Recall 0.9963, Prec 0.9969, F1 0.9966.\",\"PeriodicalId\":320979,\"journal\":{\"name\":\"Journal of Artificial Intelligence and Engineering Applications (JAIEA)\",\"volume\":\"129 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence and Engineering Applications (JAIEA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.59934/jaiea.v3i2.426\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Engineering Applications (JAIEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59934/jaiea.v3i2.426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在银行申请贷款时,需要根据数据和信用评分进行一系列相关评估,以确定借款人是否有资格从银行获得贷款。机器学习是评估个人是否值得获得贷款的基础,以降低银行面临的潜在风险。本研究旨在从贷款审批预测数据集中获取最佳准确度值,该数据集来自开放数据集提供商网站,即 Kaggle。该贷款审批预测数据集有 14 个特征,4,269 个数据。对 4,269 条数据进行的数据集分析结果显示,可研究的数据量为 4,173 条(2,599 条数据被批准,1,574 条数据被拒绝)。对 14 个特征的重要性评估结果表明,与其他特征相比,贷款金额是最重要的特征,而银行资产价值是影响最小的特征。对贷款审批预测数据集的研究还通过测试几种决策树集合模型进行,包括极端梯度提升(XGBoost)、轻梯度提升机(Light GBM)、梯度提升、随机森林、自适应提升(Adaboost)和额外树。比较结果表明,XGBoost(极端梯度提升)模型是最好的模型,其准确率为 0.9974,AUC 为 0.9998,Recall 为 0.9963,Prec 为 0.9969,F1 为 0.9966。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance Analysis of Ensemble Learning and Feature Selection Methods in Loan Approval Prediction at Banks
Applying for a loan at a bank has a series of relevant assessments based on data and credit scores in determining a borrower's eligibility to receive a loan from the bank. Machine learning is the basis for evaluating whether an individual is worthy of obtaining a loan, in order to reduce the potential risks faced by banks. This research aims to obtain the best accuracy value from the Loan Approval Prediction dataset which is sourced from the open dataset provider website, namely Kaggle. This Loan Approval Prediction dataset has 14 features with 4,269 data. The results of dataset analysis carried out on 4,269 data showed that the amount of data that could be studied was 4,173 data (2,599 data were approved and 1,574 data were rejected). The results of the feature importance evaluation on 14 features show that loan amount is the most important feature compared to other features, while bank asset value is the feature that has the lowest influence. Research on the Loan Approval Prediction dataset was also carried out by testing several Decision Tree ensemble models, including Extreme Gradient Boosting or XGBoost, Light Gradient Boosting Machine (Light GBM), Gradient Boosting, Random Forest, Adaptive Boosting (Adaboost) and Extra Trees. The comparison results show that the XGBoost (Extreme Gradient Boosting) model is the best model, with Accuracy 0.9974, AUC 0.9998, Recall 0.9963, Prec 0.9969, F1 0.9966.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信