Comparative Analysis ofK-Nn, Naïve Bayes, and logistic regression for credit card fraud detection

IF 0.3 Q4 ENGINEERING, MULTIDISCIPLINARY

Ingenieria Solidaria Pub Date : 2023-09-15 DOI:10.16925/2357-6014.2023.03.05

Kavita Arora, Sonal Pathak, Nguyen Thi Dieu Linh

{"title":"Comparative Analysis ofK-Nn, Naïve Bayes, and logistic regression for credit card fraud detection","authors":"Kavita Arora, Sonal Pathak, Nguyen Thi Dieu Linh","doi":"10.16925/2357-6014.2023.03.05","DOIUrl":null,"url":null,"abstract":"Introduction:This paper highlights the outcome of the comparative study of “Various Machine learning algo-rithms namely K-NN, Naive Bayes, and Logistic Regression for Credit Card Fraud Detection” carried out based on a dataset taken from UCI.com in 2022-23 at Manav Rachna International Institute of Research and Studies.Problem: Credit card fraud is still rife today and the modes are increasingly varied. Quite often we hear of fraud cases that cause irreplaceable injury to banks and financial institutions which cannot be compensated in terms of costs. To avoid scams with various modes of credit cards, we must be able to identify and find out the modes often used by fraudsters. This scheme liberates such financial institutions and banks with complete and appropriate information using Machine Learning Techniques, not only about the modes that scammers or fraudsters often use but also ways to protect against such frauds.Objective: The present paper discusses the various machine learning models based on classification and re-gression, namely K-Nearest Neighbors, Naïve Bayes, and Logistic Regression, which are successfully able to achieve the classification accuracy of 80% using Logistic Regression with a Precision of 78%, Recall of 100%, and F1-Score of 88% for fraudulent credit card transactions. Methodology: The comparative analysis demonstrates that for Precision, Recall, and Accuracy parameters, the K-Nearest Neighbor is a better approach for detecting fraudulent transactions than the Logistic Regression and Naïve Bayes. Results:The accuracy is marginal high in Logistic Regression but the False Positive parameters are not able to identify the imbalanced data; therefore, they disguise the results and accuracy of Logistic Regression and K-Nearest Neighbor deems fit for such cases.Conclusion: This scheme depicts the automated fraud classification systems using machine learning techni-ques, namely K-Nearest Neighbor, Logistic Regression, and Naive Bayes, to produce a model that can distin-guish valid and invalid credit card transactions.Originality:Through this research, the most relevant features are used to go through the visualization of accu-racy with the confusion matrix, and accuracy calculations are obtained from the dataset used.Limitations:Deep learning techniques could have been used to fetch even better results.","PeriodicalId":41023,"journal":{"name":"Ingenieria Solidaria","volume":"2 1","pages":"0"},"PeriodicalIF":0.3000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ingenieria Solidaria","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.16925/2357-6014.2023.03.05","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction:This paper highlights the outcome of the comparative study of “Various Machine learning algo-rithms namely K-NN, Naive Bayes, and Logistic Regression for Credit Card Fraud Detection” carried out based on a dataset taken from UCI.com in 2022-23 at Manav Rachna International Institute of Research and Studies.Problem: Credit card fraud is still rife today and the modes are increasingly varied. Quite often we hear of fraud cases that cause irreplaceable injury to banks and financial institutions which cannot be compensated in terms of costs. To avoid scams with various modes of credit cards, we must be able to identify and find out the modes often used by fraudsters. This scheme liberates such financial institutions and banks with complete and appropriate information using Machine Learning Techniques, not only about the modes that scammers or fraudsters often use but also ways to protect against such frauds.Objective: The present paper discusses the various machine learning models based on classification and re-gression, namely K-Nearest Neighbors, Naïve Bayes, and Logistic Regression, which are successfully able to achieve the classification accuracy of 80% using Logistic Regression with a Precision of 78%, Recall of 100%, and F1-Score of 88% for fraudulent credit card transactions. Methodology: The comparative analysis demonstrates that for Precision, Recall, and Accuracy parameters, the K-Nearest Neighbor is a better approach for detecting fraudulent transactions than the Logistic Regression and Naïve Bayes. Results:The accuracy is marginal high in Logistic Regression but the False Positive parameters are not able to identify the imbalanced data; therefore, they disguise the results and accuracy of Logistic Regression and K-Nearest Neighbor deems fit for such cases.Conclusion: This scheme depicts the automated fraud classification systems using machine learning techni-ques, namely K-Nearest Neighbor, Logistic Regression, and Naive Bayes, to produce a model that can distin-guish valid and invalid credit card transactions.Originality:Through this research, the most relevant features are used to go through the visualization of accu-racy with the confusion matrix, and accuracy calculations are obtained from the dataset used.Limitations:Deep learning techniques could have been used to fetch even better results.

查看原文本刊更多论文

k - nn、Naïve贝叶斯和逻辑回归在信用卡欺诈检测中的比较分析

本文重点介绍了在Manav Rachna国际研究所对“信用卡欺诈检测的各种机器学习算法，即K-NN，朴素贝叶斯和逻辑回归”进行的比较研究的结果，该研究基于2022-23年从UCI.com获取的数据集。问题:信用卡诈骗在今天仍然很普遍，而且方式也越来越多样。我们经常听到欺诈案件对银行和金融机构造成不可替代的损害，而这些损害无法在成本方面得到补偿。为了避免各种信用卡模式的诈骗，我们必须能够识别和找出骗子经常使用的模式。该计划使用机器学习技术为这些金融机构和银行提供完整和适当的信息，不仅是关于骗子或欺诈者经常使用的模式，还包括防止此类欺诈的方法。目的:本文讨论了基于分类和回归的各种机器学习模型，即k近邻、Naïve贝叶斯和逻辑回归，这些模型使用逻辑回归成功地实现了80%的分类准确率，Precision为78%，Recall为100%，F1-Score为88%的欺诈性信用卡交易。方法:比较分析表明，对于Precision, Recall和Accuracy参数，K-Nearest Neighbor是比Logistic Regression和Naïve Bayes更好的检测欺诈交易的方法。结果:Logistic回归的准确率较高，但假阳性参数不能识别不平衡数据;因此，它们掩盖了逻辑回归和k近邻认为适合这种情况的结果和准确性。结论:该方案描述了使用机器学习技术的自动欺诈分类系统，即k近邻，逻辑回归和朴素贝叶斯，以产生一个可以区分有效和无效信用卡交易的模型。独创性:通过本研究，利用最相关的特征与混淆矩阵进行精度可视化，并从所使用的数据集中获得精度计算。局限性:深度学习技术可以用来获得更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ingenieria Solidaria ENGINEERING, MULTIDISCIPLINARY-

自引率

0.00%

发文量