Multilayer Perceptron Optimization on Imbalanced Data Using SVM-SMOTE and One-Hot Encoding for Credit Card Default Prediction

Journal of Advances in Information Systems and Technology Pub Date : 2022-09-06 DOI:10.15294/jaist.v3i2.57061

Adi Sakti Almajid

{"title":"Multilayer Perceptron Optimization on Imbalanced Data Using SVM-SMOTE and One-Hot Encoding for Credit Card Default Prediction","authors":"Adi Sakti Almajid","doi":"10.15294/jaist.v3i2.57061","DOIUrl":null,"url":null,"abstract":"Credit risk assessment analysis by classifying potential users is an important process to reduce the occurrence of default users. The problems faced from the classification process using real-world datasets are imbalanced data that causes bias-to-majority in model training outcomes. These problems cause the algorithm to only focus on the majority class and ignore the minority class, even though both classes have the same important role. To overcome this problem, a combination of One-hot encoding (OHE) and SVM-Synthetic minority oversampling technique (SVM-SMOTE) techniques are used for the optimization process of the MLP classification algorithm. OHE is used to encode values categorical nominal and SVM-SMOTE for the oversampling. The results of the measurement of the ability of the model generated from the optimized MLP are then compared with the baseline using the AUC score. The data used is the default of credit card client dataset from Taiwan which has 30000 instances. The result of the highest AUC score of the MLP that has gone through optimization is 0.7184, an increase of 0.2179 compared to the baseline.","PeriodicalId":418742,"journal":{"name":"Journal of Advances in Information Systems and Technology","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Information Systems and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/jaist.v3i2.57061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Credit risk assessment analysis by classifying potential users is an important process to reduce the occurrence of default users. The problems faced from the classification process using real-world datasets are imbalanced data that causes bias-to-majority in model training outcomes. These problems cause the algorithm to only focus on the majority class and ignore the minority class, even though both classes have the same important role. To overcome this problem, a combination of One-hot encoding (OHE) and SVM-Synthetic minority oversampling technique (SVM-SMOTE) techniques are used for the optimization process of the MLP classification algorithm. OHE is used to encode values categorical nominal and SVM-SMOTE for the oversampling. The results of the measurement of the ability of the model generated from the optimized MLP are then compared with the baseline using the AUC score. The data used is the default of credit card client dataset from Taiwan which has 30000 instances. The result of the highest AUC score of the MLP that has gone through optimization is 0.7184, an increase of 0.2179 compared to the baseline.

查看原文本刊更多论文

基于SVM-SMOTE和One-Hot编码的不平衡数据多层感知器优化信用卡违约预测

对潜在用户进行分类进行信用风险评估分析是减少违约用户发生的重要过程。使用真实世界数据集的分类过程所面临的问题是数据不平衡，导致模型训练结果中的偏多数。这些问题导致算法只关注多数类而忽略少数类，尽管这两个类具有同样重要的作用。为了克服这一问题，将One-hot encoding (OHE)和SVM-Synthetic minority oversampling technique (SVM-SMOTE)技术相结合，对MLP分类算法进行优化。OHE用于对过采样的分类标称值和SVM-SMOTE进行编码。然后使用AUC分数将由优化的MLP生成的模型的能力测量结果与基线进行比较。使用的数据是来自台湾的信用卡客户端数据集的默认值，该数据集有30000个实例。优化后的MLP最高AUC得分为0.7184，较基线提高0.2179。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Advances in Information Systems and Technology

自引率

0.00%

发文量