Muhamad Sopiyan, Fauziah Fauziah, Yunan Fauzi Wijaya
{"title":"Fraud Detection Using Random Forest Classifier, Logistic Regression, and Gradient Boosting Classifier Algorithms on Credit Cards","authors":"Muhamad Sopiyan, Fauziah Fauziah, Yunan Fauzi Wijaya","doi":"10.30595/juita.v10i1.12050","DOIUrl":null,"url":null,"abstract":"The following credit card records were used in this study of 284.807 transactions made by credit card holders in Europe for two days from the Kaggle dataset. This is a very poor data set, having 492 transactions, an imbalance of only 0.172% of the 284.807 transactions. The purpose of this study is to obtain the best model and then simulate it by electronically detecting unauthorized financial transactions in bank payment systems. The dataset for this study is unbalanced class data with 99.80% for the major class and 0.2% for the minor class. This type of class-imbalanced data problem is solved by applying method a combination of minority oversampling techniques using Synthetic Minority Oversampling Technique (SMOTE). To determine the most appropriate and accurate classification in solving class balance problems, comparisons were made with the Random Forest Classifier (RFC), Logistic Regression (LGR), and Gradient Boosting Classifier (GBC) algorithms. The test results in this study are the Random Forest Classifier (RFC) algorithm is better than other algorithms because it has the highest accuracy the percentage of data-train is 100% and data-test is 99.99% and the evaluation of the AUC score as a result of algorithm testing is 0.9999.","PeriodicalId":174460,"journal":{"name":"JUITA: Jurnal Informatika","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JUITA: Jurnal Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30595/juita.v10i1.12050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The following credit card records were used in this study of 284.807 transactions made by credit card holders in Europe for two days from the Kaggle dataset. This is a very poor data set, having 492 transactions, an imbalance of only 0.172% of the 284.807 transactions. The purpose of this study is to obtain the best model and then simulate it by electronically detecting unauthorized financial transactions in bank payment systems. The dataset for this study is unbalanced class data with 99.80% for the major class and 0.2% for the minor class. This type of class-imbalanced data problem is solved by applying method a combination of minority oversampling techniques using Synthetic Minority Oversampling Technique (SMOTE). To determine the most appropriate and accurate classification in solving class balance problems, comparisons were made with the Random Forest Classifier (RFC), Logistic Regression (LGR), and Gradient Boosting Classifier (GBC) algorithms. The test results in this study are the Random Forest Classifier (RFC) algorithm is better than other algorithms because it has the highest accuracy the percentage of data-train is 100% and data-test is 99.99% and the evaluation of the AUC score as a result of algorithm testing is 0.9999.