Jiaxin Gao, Zirui Zhou, Jiangshan Ai, Bingxin Xia, Stephen Coggeshall
{"title":"Predicting Credit Card Transaction Fraud Using Machine Learning Algorithms","authors":"Jiaxin Gao, Zirui Zhou, Jiangshan Ai, Bingxin Xia, Stephen Coggeshall","doi":"10.4236/JILSA.2019.113003","DOIUrl":null,"url":null,"abstract":"Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling and machine learning models on real credit card transaction data. The models built are supervised fraud models that attempt to identify which transactions are most likely fraudulent. We discuss the processes of data exploration, data cleaning, variable creation, feature selection, model algorithms, and results. Five different supervised models are explored and compared including logistic regression, neural networks, random forest, boosted tree and support vector machines. The boosted tree model shows the best fraud detection result (FDR = 49.83%) for this particular data set. The resulting model can be utilized in a credit card fraud detection system. A similar model development process can be performed in related business domains such as insurance and telecommunications, to avoid or detect fraudulent activity.","PeriodicalId":69452,"journal":{"name":"智能学习系统与应用(英文)","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"智能学习系统与应用(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/JILSA.2019.113003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Credit card fraud is a wide-ranging issue for financial institutions, involving theft and fraud committed using a payment card. In this paper, we explore the application of linear and nonlinear statistical modeling and machine learning models on real credit card transaction data. The models built are supervised fraud models that attempt to identify which transactions are most likely fraudulent. We discuss the processes of data exploration, data cleaning, variable creation, feature selection, model algorithms, and results. Five different supervised models are explored and compared including logistic regression, neural networks, random forest, boosted tree and support vector machines. The boosted tree model shows the best fraud detection result (FDR = 49.83%) for this particular data set. The resulting model can be utilized in a credit card fraud detection system. A similar model development process can be performed in related business domains such as insurance and telecommunications, to avoid or detect fraudulent activity.