基于性能指标的信用卡欺诈检测随机森林分类器

2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) Pub Date : 2020-12-16 DOI:10.1109/CSDE50874.2020.9411563

Maung Hein Aung, Penelope Tane Seluka, Jean Tiana Rose Fuata, Maria Josephine Tikoisuva, Matalita Seremana Cabealawa, Ravneil Nand

{"title":"基于性能指标的信用卡欺诈检测随机森林分类器","authors":"Maung Hein Aung, Penelope Tane Seluka, Jean Tiana Rose Fuata, Maria Josephine Tikoisuva, Matalita Seremana Cabealawa, Ravneil Nand","doi":"10.1109/CSDE50874.2020.9411563","DOIUrl":null,"url":null,"abstract":"There are many classification algorithms available, however, one classifier that can be used for a problem domain with paramount accuracy is hard to find. Classification algorithm is a technique used to map data into known classes or outputs. A problem area that has seen a lot of application of classification algorithm is the Credit Card Fraud. Credit card fraud is not a new area that needs exploration but still there is scope to narrow down the best classification algorithm to rely upon to detect frauds in real time. In this paper, the focus is on investigating and determining which classification algorithm is the best one for detecting Credit Card Fraud through benchmark datasets. It has been found that Random Forest has the best accuracy when compared to other classifiers. The study would assist researchers in choosing the best classification scheme with the guideline provided for any credit card fraud dataset. The two datasets used in this research are imbalanced datasets, therefore, for better comparison of the algorithms, a balanced set is also used. The balancing of dataset is done through Synthetic Minority Oversampling Technique (SMOT). The comparison of results is done on 6 algorithms, namely, Random Forest, Logistic Regression, Neural Networks, Support Vector Machines (SVMs), Naive Bayes and K-Nearest Neighbor (KNN). The results are compared through two software; Weka and Python. The outcome of the experiment show that the methodology is indeed of great assistance in any practical applications.","PeriodicalId":445708,"journal":{"name":"2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Random Forest Classifier for Detecting Credit Card Fraud based on Performance Metrics\",\"authors\":\"Maung Hein Aung, Penelope Tane Seluka, Jean Tiana Rose Fuata, Maria Josephine Tikoisuva, Matalita Seremana Cabealawa, Ravneil Nand\",\"doi\":\"10.1109/CSDE50874.2020.9411563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are many classification algorithms available, however, one classifier that can be used for a problem domain with paramount accuracy is hard to find. Classification algorithm is a technique used to map data into known classes or outputs. A problem area that has seen a lot of application of classification algorithm is the Credit Card Fraud. Credit card fraud is not a new area that needs exploration but still there is scope to narrow down the best classification algorithm to rely upon to detect frauds in real time. In this paper, the focus is on investigating and determining which classification algorithm is the best one for detecting Credit Card Fraud through benchmark datasets. It has been found that Random Forest has the best accuracy when compared to other classifiers. The study would assist researchers in choosing the best classification scheme with the guideline provided for any credit card fraud dataset. The two datasets used in this research are imbalanced datasets, therefore, for better comparison of the algorithms, a balanced set is also used. The balancing of dataset is done through Synthetic Minority Oversampling Technique (SMOT). The comparison of results is done on 6 algorithms, namely, Random Forest, Logistic Regression, Neural Networks, Support Vector Machines (SVMs), Naive Bayes and K-Nearest Neighbor (KNN). The results are compared through two software; Weka and Python. The outcome of the experiment show that the methodology is indeed of great assistance in any practical applications.\",\"PeriodicalId\":445708,\"journal\":{\"name\":\"2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSDE50874.2020.9411563\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSDE50874.2020.9411563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

有许多可用的分类算法，但是，很难找到一个可以用于具有最高准确性的问题域的分类器。分类算法是一种用于将数据映射到已知类或输出的技术。分类算法应用较多的一个问题领域是信用卡欺诈。信用卡欺诈并不是一个需要探索的新领域，但仍然有余地缩小最佳分类算法的范围，以便实时检测欺诈行为。本文的重点是通过基准数据集研究和确定哪种分类算法是检测信用卡欺诈的最佳算法。与其他分类器相比，随机森林具有最好的准确率。这项研究将帮助研究人员选择最佳的分类方案，并为任何信用卡欺诈数据集提供指导。本研究中使用的两个数据集都是不平衡数据集，因此为了更好地比较算法，我们还使用了一个平衡集。通过合成少数派过采样技术(SMOT)实现数据集的平衡。比较了随机森林、逻辑回归、神经网络、支持向量机(svm)、朴素贝叶斯和k近邻(KNN) 6种算法的结果。通过两个软件对结果进行了比较;Weka和Python。实验结果表明，该方法在实际应用中确实有很大的帮助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Random Forest Classifier for Detecting Credit Card Fraud based on Performance Metrics

There are many classification algorithms available, however, one classifier that can be used for a problem domain with paramount accuracy is hard to find. Classification algorithm is a technique used to map data into known classes or outputs. A problem area that has seen a lot of application of classification algorithm is the Credit Card Fraud. Credit card fraud is not a new area that needs exploration but still there is scope to narrow down the best classification algorithm to rely upon to detect frauds in real time. In this paper, the focus is on investigating and determining which classification algorithm is the best one for detecting Credit Card Fraud through benchmark datasets. It has been found that Random Forest has the best accuracy when compared to other classifiers. The study would assist researchers in choosing the best classification scheme with the guideline provided for any credit card fraud dataset. The two datasets used in this research are imbalanced datasets, therefore, for better comparison of the algorithms, a balanced set is also used. The balancing of dataset is done through Synthetic Minority Oversampling Technique (SMOT). The comparison of results is done on 6 algorithms, namely, Random Forest, Logistic Regression, Neural Networks, Support Vector Machines (SVMs), Naive Bayes and K-Nearest Neighbor (KNN). The results are compared through two software; Weka and Python. The outcome of the experiment show that the methodology is indeed of great assistance in any practical applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)

自引率

0.00%

发文量