基于非平衡数据集的信用卡匿名交易欺诈检测的高效重采样

2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC) Pub Date : 2020-12-01 DOI:10.1109/UCC48980.2020.00067

Petr Mrozek, John Panneerselvam, O. Bagdasar

{"title":"基于非平衡数据集的信用卡匿名交易欺诈检测的高效重采样","authors":"Petr Mrozek, John Panneerselvam, O. Bagdasar","doi":"10.1109/UCC48980.2020.00067","DOIUrl":null,"url":null,"abstract":"The rapid growth of e-commerce and online shopping have resulted in an unprecedented increase in the amount of money that is annually lost to credit card fraudsters. In an attempt to address credit card fraud, researchers are leveraging the application of various machine learning techniques for efficiently detecting and preventing fraudulent credit card transactions. One of the prevalent common issues around the analytics of credit card transactions is the highly unbalanced nature of the datasets, which is frequently associated with the binary classification problems. This paper intends to review, analyse and implement a selection of notable machine learning algorithms such as Logistic Regression, Random Forest, K-Nearest Neighbours and Stochastic Gradient Descent, with the motivation of empirically evaluating their efficiencies in handling unbalanced datasets whilst detecting credit card fraud transactions. A publicly available dataset comprising 284807 transactions of European cardholders is analysed and trained with the studied machine learning techniques to detect fraudulent transactions. Furthermore, this paper also evaluates the incorporation of two notable resampling methods, namely Random Under-sampling and Synthetic Majority Oversampling Techniques (SMOTE) in the aforementioned algorithms, in order to analyse their efficiency in handling unbalanced datasets. The proposed resampling methods significantly increased the detection ability, the most successful technique of combination of Random Forest with Random Under-sampling achieved the recall score of 100% in contrast to the recall score 77% of model without resampling technique. The key contribution of this paper is the postulation of efficient machine learning algorithms together with suitable resampling methods, suitable for credit card fraud detection with unbalanced dataset.","PeriodicalId":125849,"journal":{"name":"2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Efficient Resampling for Fraud Detection During Anonymised Credit Card Transactions with Unbalanced Datasets\",\"authors\":\"Petr Mrozek, John Panneerselvam, O. Bagdasar\",\"doi\":\"10.1109/UCC48980.2020.00067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid growth of e-commerce and online shopping have resulted in an unprecedented increase in the amount of money that is annually lost to credit card fraudsters. In an attempt to address credit card fraud, researchers are leveraging the application of various machine learning techniques for efficiently detecting and preventing fraudulent credit card transactions. One of the prevalent common issues around the analytics of credit card transactions is the highly unbalanced nature of the datasets, which is frequently associated with the binary classification problems. This paper intends to review, analyse and implement a selection of notable machine learning algorithms such as Logistic Regression, Random Forest, K-Nearest Neighbours and Stochastic Gradient Descent, with the motivation of empirically evaluating their efficiencies in handling unbalanced datasets whilst detecting credit card fraud transactions. A publicly available dataset comprising 284807 transactions of European cardholders is analysed and trained with the studied machine learning techniques to detect fraudulent transactions. Furthermore, this paper also evaluates the incorporation of two notable resampling methods, namely Random Under-sampling and Synthetic Majority Oversampling Techniques (SMOTE) in the aforementioned algorithms, in order to analyse their efficiency in handling unbalanced datasets. The proposed resampling methods significantly increased the detection ability, the most successful technique of combination of Random Forest with Random Under-sampling achieved the recall score of 100% in contrast to the recall score 77% of model without resampling technique. The key contribution of this paper is the postulation of efficient machine learning algorithms together with suitable resampling methods, suitable for credit card fraud detection with unbalanced dataset.\",\"PeriodicalId\":125849,\"journal\":{\"name\":\"2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)\",\"volume\":\"71 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UCC48980.2020.00067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC48980.2020.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

电子商务和网上购物的快速发展导致信用卡诈骗者每年损失的金额空前增加。为了解决信用卡欺诈问题，研究人员正在利用各种机器学习技术的应用来有效地检测和防止欺诈性信用卡交易。信用卡交易分析的一个普遍问题是数据集的高度不平衡，这通常与二元分类问题有关。本文旨在回顾、分析和实现一系列著名的机器学习算法，如逻辑回归、随机森林、k近邻和随机梯度下降，其动机是在检测信用卡欺诈交易的同时，通过经验评估它们在处理不平衡数据集方面的效率。一个公开可用的数据集包括284807笔欧洲持卡人的交易，并使用研究的机器学习技术进行分析和训练，以检测欺诈交易。此外，本文还评估了上述算法中两种著名的重采样方法，即随机欠采样和合成多数过采样技术(SMOTE)，以分析它们在处理不平衡数据集方面的效率。提出的重采样方法显著提高了检测能力，其中随机森林与随机欠采样相结合的方法最成功，召回率达到100%，而没有重采样的模型召回率为77%。本文的关键贡献是假设了有效的机器学习算法以及合适的重采样方法，适用于不平衡数据集的信用卡欺诈检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Resampling for Fraud Detection During Anonymised Credit Card Transactions with Unbalanced Datasets

The rapid growth of e-commerce and online shopping have resulted in an unprecedented increase in the amount of money that is annually lost to credit card fraudsters. In an attempt to address credit card fraud, researchers are leveraging the application of various machine learning techniques for efficiently detecting and preventing fraudulent credit card transactions. One of the prevalent common issues around the analytics of credit card transactions is the highly unbalanced nature of the datasets, which is frequently associated with the binary classification problems. This paper intends to review, analyse and implement a selection of notable machine learning algorithms such as Logistic Regression, Random Forest, K-Nearest Neighbours and Stochastic Gradient Descent, with the motivation of empirically evaluating their efficiencies in handling unbalanced datasets whilst detecting credit card fraud transactions. A publicly available dataset comprising 284807 transactions of European cardholders is analysed and trained with the studied machine learning techniques to detect fraudulent transactions. Furthermore, this paper also evaluates the incorporation of two notable resampling methods, namely Random Under-sampling and Synthetic Majority Oversampling Techniques (SMOTE) in the aforementioned algorithms, in order to analyse their efficiency in handling unbalanced datasets. The proposed resampling methods significantly increased the detection ability, the most successful technique of combination of Random Forest with Random Under-sampling achieved the recall score of 100% in contrast to the recall score 77% of model without resampling technique. The key contribution of this paper is the postulation of efficient machine learning algorithms together with suitable resampling methods, suitable for credit card fraud detection with unbalanced dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)

自引率

0.00%

发文量