Deep convolutional neural network based synthetic minority over sampling technique: a forfending model for fraudulent credit card transactions in financial institution
{"title":"Deep convolutional neural network based synthetic minority over sampling technique: a forfending model for fraudulent credit card transactions in financial institution","authors":"L. G. Salaudeen, D. Gabi, M. Garba, H. Suru","doi":"10.46481/jnsps.2024.2037","DOIUrl":null,"url":null,"abstract":"\n\n\nFraudulent credit card transactions are committed by unauthorized individuals and organizations employing methods such as phishing and social engineering fraud tactics. Researchers propose several Machine Learning (ML) techniques to deter the challenges of credit card fraud. However, the ML approaches are endorsed with some challenges, which makes the detection of credit card fraud extremely difficult. This study proposes a Deep Convolutional Neural Network (DCNN) with Synthetic Minority Oversampling Techniques (SMOTE) as an ideal solution. Kaggle datasets with 284,807 records and 31 features were exploited. Implementation was performed on the Google Colab cloud-based platform, embedding a Jupyter notebook setting with Graphical Processing Units (GPUs). Two experiments were conducted; the first was probed to determine suitable models among baseline models: Logistic Regression (LR), Random Forest (RF), Isolation Forest, and a single Deep Learning (DL) model of Multiple Layer Perceptron (MLP). The baseline models yielded an overfitting accuracy score, with recall, specificity, precision, and F1-score all presenting 1.00% respectively. This outcome is not sufficient in establishing findings on imbalanced data distribution as it's biased. This led to the construction of a new ML model incorporating Light Gradient Boosting Machine (LGBM), with Artificial Neural Network (ANN) and the proposed DCNN+SMOTE for the second experimental phase alongside baseline models. Experimental results via simulation show the proposed DCNN+SMOTE yielded awesome superclass performance across the board, displaying 1.00% results respectively. Its Error Rate (ER) and Null Error Rate (NER) are 0.00% distinctly. Meanwhile, the False Positive Rate (FPR) yields a 0.001% result, lesser and better than the baseline models.\n\n\n","PeriodicalId":342917,"journal":{"name":"Journal of the Nigerian Society of Physical Sciences","volume":"122 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Nigerian Society of Physical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46481/jnsps.2024.2037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Fraudulent credit card transactions are committed by unauthorized individuals and organizations employing methods such as phishing and social engineering fraud tactics. Researchers propose several Machine Learning (ML) techniques to deter the challenges of credit card fraud. However, the ML approaches are endorsed with some challenges, which makes the detection of credit card fraud extremely difficult. This study proposes a Deep Convolutional Neural Network (DCNN) with Synthetic Minority Oversampling Techniques (SMOTE) as an ideal solution. Kaggle datasets with 284,807 records and 31 features were exploited. Implementation was performed on the Google Colab cloud-based platform, embedding a Jupyter notebook setting with Graphical Processing Units (GPUs). Two experiments were conducted; the first was probed to determine suitable models among baseline models: Logistic Regression (LR), Random Forest (RF), Isolation Forest, and a single Deep Learning (DL) model of Multiple Layer Perceptron (MLP). The baseline models yielded an overfitting accuracy score, with recall, specificity, precision, and F1-score all presenting 1.00% respectively. This outcome is not sufficient in establishing findings on imbalanced data distribution as it's biased. This led to the construction of a new ML model incorporating Light Gradient Boosting Machine (LGBM), with Artificial Neural Network (ANN) and the proposed DCNN+SMOTE for the second experimental phase alongside baseline models. Experimental results via simulation show the proposed DCNN+SMOTE yielded awesome superclass performance across the board, displaying 1.00% results respectively. Its Error Rate (ER) and Null Error Rate (NER) are 0.00% distinctly. Meanwhile, the False Positive Rate (FPR) yields a 0.001% result, lesser and better than the baseline models.