银行贷款审批预测中的集合学习和特征选择方法性能分析

Journal of Artificial Intelligence and Engineering Applications (JAIEA) Pub Date : 2024-02-15 DOI:10.59934/jaiea.v3i2.426

Iqbal Muhammad, Rizka Dahlia, Muhammad Ifan Rifani Ihsan, Lisnawanty, Rabiatus Sa’adah

{"title":"银行贷款审批预测中的集合学习和特征选择方法性能分析","authors":"Iqbal Muhammad, Rizka Dahlia, Muhammad Ifan Rifani Ihsan, Lisnawanty, Rabiatus Sa’adah","doi":"10.59934/jaiea.v3i2.426","DOIUrl":null,"url":null,"abstract":"Applying for a loan at a bank has a series of relevant assessments based on data and credit scores in determining a borrower's eligibility to receive a loan from the bank. Machine learning is the basis for evaluating whether an individual is worthy of obtaining a loan, in order to reduce the potential risks faced by banks. This research aims to obtain the best accuracy value from the Loan Approval Prediction dataset which is sourced from the open dataset provider website, namely Kaggle. This Loan Approval Prediction dataset has 14 features with 4,269 data. The results of dataset analysis carried out on 4,269 data showed that the amount of data that could be studied was 4,173 data (2,599 data were approved and 1,574 data were rejected). The results of the feature importance evaluation on 14 features show that loan amount is the most important feature compared to other features, while bank asset value is the feature that has the lowest influence. Research on the Loan Approval Prediction dataset was also carried out by testing several Decision Tree ensemble models, including Extreme Gradient Boosting or XGBoost, Light Gradient Boosting Machine (Light GBM), Gradient Boosting, Random Forest, Adaptive Boosting (Adaboost) and Extra Trees. The comparison results show that the XGBoost (Extreme Gradient Boosting) model is the best model, with Accuracy 0.9974, AUC 0.9998, Recall 0.9963, Prec 0.9969, F1 0.9966.","PeriodicalId":320979,"journal":{"name":"Journal of Artificial Intelligence and Engineering Applications (JAIEA)","volume":"129 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Analysis of Ensemble Learning and Feature Selection Methods in Loan Approval Prediction at Banks\",\"authors\":\"Iqbal Muhammad, Rizka Dahlia, Muhammad Ifan Rifani Ihsan, Lisnawanty, Rabiatus Sa’adah\",\"doi\":\"10.59934/jaiea.v3i2.426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applying for a loan at a bank has a series of relevant assessments based on data and credit scores in determining a borrower's eligibility to receive a loan from the bank. Machine learning is the basis for evaluating whether an individual is worthy of obtaining a loan, in order to reduce the potential risks faced by banks. This research aims to obtain the best accuracy value from the Loan Approval Prediction dataset which is sourced from the open dataset provider website, namely Kaggle. This Loan Approval Prediction dataset has 14 features with 4,269 data. The results of dataset analysis carried out on 4,269 data showed that the amount of data that could be studied was 4,173 data (2,599 data were approved and 1,574 data were rejected). The results of the feature importance evaluation on 14 features show that loan amount is the most important feature compared to other features, while bank asset value is the feature that has the lowest influence. Research on the Loan Approval Prediction dataset was also carried out by testing several Decision Tree ensemble models, including Extreme Gradient Boosting or XGBoost, Light Gradient Boosting Machine (Light GBM), Gradient Boosting, Random Forest, Adaptive Boosting (Adaboost) and Extra Trees. The comparison results show that the XGBoost (Extreme Gradient Boosting) model is the best model, with Accuracy 0.9974, AUC 0.9998, Recall 0.9963, Prec 0.9969, F1 0.9966.\",\"PeriodicalId\":320979,\"journal\":{\"name\":\"Journal of Artificial Intelligence and Engineering Applications (JAIEA)\",\"volume\":\"129 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence and Engineering Applications (JAIEA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.59934/jaiea.v3i2.426\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Engineering Applications (JAIEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59934/jaiea.v3i2.426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在银行申请贷款时，需要根据数据和信用评分进行一系列相关评估，以确定借款人是否有资格从银行获得贷款。机器学习是评估个人是否值得获得贷款的基础，以降低银行面临的潜在风险。本研究旨在从贷款审批预测数据集中获取最佳准确度值，该数据集来自开放数据集提供商网站，即 Kaggle。该贷款审批预测数据集有 14 个特征，4,269 个数据。对 4,269 条数据进行的数据集分析结果显示，可研究的数据量为 4,173 条（2,599 条数据被批准，1,574 条数据被拒绝）。对 14 个特征的重要性评估结果表明，与其他特征相比，贷款金额是最重要的特征，而银行资产价值是影响最小的特征。对贷款审批预测数据集的研究还通过测试几种决策树集合模型进行，包括极端梯度提升（XGBoost）、轻梯度提升机（Light GBM）、梯度提升、随机森林、自适应提升（Adaboost）和额外树。比较结果表明，XGBoost（极端梯度提升）模型是最好的模型，其准确率为 0.9974，AUC 为 0.9998，Recall 为 0.9963，Prec 为 0.9969，F1 为 0.9966。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance Analysis of Ensemble Learning and Feature Selection Methods in Loan Approval Prediction at Banks

Applying for a loan at a bank has a series of relevant assessments based on data and credit scores in determining a borrower's eligibility to receive a loan from the bank. Machine learning is the basis for evaluating whether an individual is worthy of obtaining a loan, in order to reduce the potential risks faced by banks. This research aims to obtain the best accuracy value from the Loan Approval Prediction dataset which is sourced from the open dataset provider website, namely Kaggle. This Loan Approval Prediction dataset has 14 features with 4,269 data. The results of dataset analysis carried out on 4,269 data showed that the amount of data that could be studied was 4,173 data (2,599 data were approved and 1,574 data were rejected). The results of the feature importance evaluation on 14 features show that loan amount is the most important feature compared to other features, while bank asset value is the feature that has the lowest influence. Research on the Loan Approval Prediction dataset was also carried out by testing several Decision Tree ensemble models, including Extreme Gradient Boosting or XGBoost, Light Gradient Boosting Machine (Light GBM), Gradient Boosting, Random Forest, Adaptive Boosting (Adaboost) and Extra Trees. The comparison results show that the XGBoost (Extreme Gradient Boosting) model is the best model, with Accuracy 0.9974, AUC 0.9998, Recall 0.9963, Prec 0.9969, F1 0.9966.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Artificial Intelligence and Engineering Applications (JAIEA)

自引率

0.00%

发文量