Data Quality Analysis based Machine Learning models for Credit Card Fraud Detection

Journal of the University of Shanghai for Science and Technology Pub Date : 2021-06-07 DOI:10.51201/JUSST/21/05263

Amit Pundir, Rajesh Pandey

{"title":"Data Quality Analysis based Machine Learning models for Credit Card Fraud Detection","authors":"Amit Pundir, Rajesh Pandey","doi":"10.51201/JUSST/21/05263","DOIUrl":null,"url":null,"abstract":"Misrepresentation of money is a developing issue in monetary business with far-reaching consequences and keeping in mind that many processes have been found. A data quality management with data mining has been effectively applied to data sets to mechanize the investigation of massive amounts of complex information. Data mining has likewise played a notable role in identifying credit card fraud in online exchanges. Fraud detection in credit card is a data quality management issue that considered under data mining, tested for two important reasons — first, the profiles of ordinary and false practices habitually change, and also because of the explanation that charge card fraud information is exceptionally slow. This research paper examines the performance of Decision Trees, Logistics Regression, and Random Forest relies strategically on profoundly skewed credit card fraud data. The dataset of credit card transaction is sourced from Kaggle (a publically accessible dataset repository) with 284,807 transactions. These methods are applied to raw data values and data preprocessing techniques. Assessment of the performance of techniques depends on accuracy, sensitivity, specificity, precision, and recall. Results indicate the optimal accuracy for the decision trees, logistics regression, and random forest classifiers with 90.8%, 98.5%, and 99.1% respectively.","PeriodicalId":17520,"journal":{"name":"Journal of the University of Shanghai for Science and Technology","volume":"40 1","pages":"318-344"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the University of Shanghai for Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51201/JUSST/21/05263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Misrepresentation of money is a developing issue in monetary business with far-reaching consequences and keeping in mind that many processes have been found. A data quality management with data mining has been effectively applied to data sets to mechanize the investigation of massive amounts of complex information. Data mining has likewise played a notable role in identifying credit card fraud in online exchanges. Fraud detection in credit card is a data quality management issue that considered under data mining, tested for two important reasons — first, the profiles of ordinary and false practices habitually change, and also because of the explanation that charge card fraud information is exceptionally slow. This research paper examines the performance of Decision Trees, Logistics Regression, and Random Forest relies strategically on profoundly skewed credit card fraud data. The dataset of credit card transaction is sourced from Kaggle (a publically accessible dataset repository) with 284,807 transactions. These methods are applied to raw data values and data preprocessing techniques. Assessment of the performance of techniques depends on accuracy, sensitivity, specificity, precision, and recall. Results indicate the optimal accuracy for the decision trees, logistics regression, and random forest classifiers with 90.8%, 98.5%, and 99.1% respectively.

查看原文本刊更多论文

基于数据质量分析的信用卡欺诈检测机器学习模型

货币的虚假陈述是货币业务中一个发展中的问题，具有深远的影响，并记住许多过程已经被发现。基于数据挖掘的数据质量管理有效地应用于数据集，实现了对海量复杂信息的机械化调查。数据挖掘同样在识别在线交易中的信用卡欺诈方面发挥了显著作用。信用卡欺诈检测是数据挖掘下考虑的一个数据质量管理问题，测试有两个重要原因:一是普通和虚假行为的特征会习惯性地变化，二是签账卡欺诈信息异常缓慢的解释。这篇研究论文考察了决策树、物流回归和随机森林的性能，这些算法在策略上依赖于严重偏斜的信用卡欺诈数据。信用卡事务数据集来源于Kaggle(一个可公开访问的数据集存储库)，其中包含284,807个事务。这些方法应用于原始数据值和数据预处理技术。技术性能的评估取决于准确性、敏感性、特异性、精密度和召回率。结果表明，决策树、logistic回归和随机森林分类器的最佳准确率分别为90.8%、98.5%和99.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the University of Shanghai for Science and Technology

自引率

0.00%

发文量