{"title":"基于数据质量分析的信用卡欺诈检测机器学习模型","authors":"Amit Pundir, Rajesh Pandey","doi":"10.51201/JUSST/21/05263","DOIUrl":null,"url":null,"abstract":"Misrepresentation of money is a developing issue in monetary business with far-reaching consequences and keeping in mind that many processes have been found. A data quality management with data mining has been effectively applied to data sets to mechanize the investigation of massive amounts of complex information. Data mining has likewise played a notable role in identifying credit card fraud in online exchanges. Fraud detection in credit card is a data quality management issue that considered under data mining, tested for two important reasons — first, the profiles of ordinary and false practices habitually change, and also because of the explanation that charge card fraud information is exceptionally slow. This research paper examines the performance of Decision Trees, Logistics Regression, and Random Forest relies strategically on profoundly skewed credit card fraud data. The dataset of credit card transaction is sourced from Kaggle (a publically accessible dataset repository) with 284,807 transactions. These methods are applied to raw data values and data preprocessing techniques. Assessment of the performance of techniques depends on accuracy, sensitivity, specificity, precision, and recall. Results indicate the optimal accuracy for the decision trees, logistics regression, and random forest classifiers with 90.8%, 98.5%, and 99.1% respectively.","PeriodicalId":17520,"journal":{"name":"Journal of the University of Shanghai for Science and Technology","volume":"40 1","pages":"318-344"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Quality Analysis based Machine Learning models for Credit Card Fraud Detection\",\"authors\":\"Amit Pundir, Rajesh Pandey\",\"doi\":\"10.51201/JUSST/21/05263\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Misrepresentation of money is a developing issue in monetary business with far-reaching consequences and keeping in mind that many processes have been found. A data quality management with data mining has been effectively applied to data sets to mechanize the investigation of massive amounts of complex information. Data mining has likewise played a notable role in identifying credit card fraud in online exchanges. Fraud detection in credit card is a data quality management issue that considered under data mining, tested for two important reasons — first, the profiles of ordinary and false practices habitually change, and also because of the explanation that charge card fraud information is exceptionally slow. This research paper examines the performance of Decision Trees, Logistics Regression, and Random Forest relies strategically on profoundly skewed credit card fraud data. The dataset of credit card transaction is sourced from Kaggle (a publically accessible dataset repository) with 284,807 transactions. These methods are applied to raw data values and data preprocessing techniques. Assessment of the performance of techniques depends on accuracy, sensitivity, specificity, precision, and recall. Results indicate the optimal accuracy for the decision trees, logistics regression, and random forest classifiers with 90.8%, 98.5%, and 99.1% respectively.\",\"PeriodicalId\":17520,\"journal\":{\"name\":\"Journal of the University of Shanghai for Science and Technology\",\"volume\":\"40 1\",\"pages\":\"318-344\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the University of Shanghai for Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.51201/JUSST/21/05263\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the University of Shanghai for Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51201/JUSST/21/05263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data Quality Analysis based Machine Learning models for Credit Card Fraud Detection
Misrepresentation of money is a developing issue in monetary business with far-reaching consequences and keeping in mind that many processes have been found. A data quality management with data mining has been effectively applied to data sets to mechanize the investigation of massive amounts of complex information. Data mining has likewise played a notable role in identifying credit card fraud in online exchanges. Fraud detection in credit card is a data quality management issue that considered under data mining, tested for two important reasons — first, the profiles of ordinary and false practices habitually change, and also because of the explanation that charge card fraud information is exceptionally slow. This research paper examines the performance of Decision Trees, Logistics Regression, and Random Forest relies strategically on profoundly skewed credit card fraud data. The dataset of credit card transaction is sourced from Kaggle (a publically accessible dataset repository) with 284,807 transactions. These methods are applied to raw data values and data preprocessing techniques. Assessment of the performance of techniques depends on accuracy, sensitivity, specificity, precision, and recall. Results indicate the optimal accuracy for the decision trees, logistics regression, and random forest classifiers with 90.8%, 98.5%, and 99.1% respectively.