Maung Hein Aung, Penelope Tane Seluka, Jean Tiana Rose Fuata, Maria Josephine Tikoisuva, Matalita Seremana Cabealawa, Ravneil Nand
{"title":"基于性能指标的信用卡欺诈检测随机森林分类器","authors":"Maung Hein Aung, Penelope Tane Seluka, Jean Tiana Rose Fuata, Maria Josephine Tikoisuva, Matalita Seremana Cabealawa, Ravneil Nand","doi":"10.1109/CSDE50874.2020.9411563","DOIUrl":null,"url":null,"abstract":"There are many classification algorithms available, however, one classifier that can be used for a problem domain with paramount accuracy is hard to find. Classification algorithm is a technique used to map data into known classes or outputs. A problem area that has seen a lot of application of classification algorithm is the Credit Card Fraud. Credit card fraud is not a new area that needs exploration but still there is scope to narrow down the best classification algorithm to rely upon to detect frauds in real time. In this paper, the focus is on investigating and determining which classification algorithm is the best one for detecting Credit Card Fraud through benchmark datasets. It has been found that Random Forest has the best accuracy when compared to other classifiers. The study would assist researchers in choosing the best classification scheme with the guideline provided for any credit card fraud dataset. The two datasets used in this research are imbalanced datasets, therefore, for better comparison of the algorithms, a balanced set is also used. The balancing of dataset is done through Synthetic Minority Oversampling Technique (SMOT). The comparison of results is done on 6 algorithms, namely, Random Forest, Logistic Regression, Neural Networks, Support Vector Machines (SVMs), Naive Bayes and K-Nearest Neighbor (KNN). The results are compared through two software; Weka and Python. The outcome of the experiment show that the methodology is indeed of great assistance in any practical applications.","PeriodicalId":445708,"journal":{"name":"2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Random Forest Classifier for Detecting Credit Card Fraud based on Performance Metrics\",\"authors\":\"Maung Hein Aung, Penelope Tane Seluka, Jean Tiana Rose Fuata, Maria Josephine Tikoisuva, Matalita Seremana Cabealawa, Ravneil Nand\",\"doi\":\"10.1109/CSDE50874.2020.9411563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are many classification algorithms available, however, one classifier that can be used for a problem domain with paramount accuracy is hard to find. Classification algorithm is a technique used to map data into known classes or outputs. A problem area that has seen a lot of application of classification algorithm is the Credit Card Fraud. Credit card fraud is not a new area that needs exploration but still there is scope to narrow down the best classification algorithm to rely upon to detect frauds in real time. In this paper, the focus is on investigating and determining which classification algorithm is the best one for detecting Credit Card Fraud through benchmark datasets. It has been found that Random Forest has the best accuracy when compared to other classifiers. The study would assist researchers in choosing the best classification scheme with the guideline provided for any credit card fraud dataset. The two datasets used in this research are imbalanced datasets, therefore, for better comparison of the algorithms, a balanced set is also used. The balancing of dataset is done through Synthetic Minority Oversampling Technique (SMOT). The comparison of results is done on 6 algorithms, namely, Random Forest, Logistic Regression, Neural Networks, Support Vector Machines (SVMs), Naive Bayes and K-Nearest Neighbor (KNN). The results are compared through two software; Weka and Python. The outcome of the experiment show that the methodology is indeed of great assistance in any practical applications.\",\"PeriodicalId\":445708,\"journal\":{\"name\":\"2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSDE50874.2020.9411563\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSDE50874.2020.9411563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Random Forest Classifier for Detecting Credit Card Fraud based on Performance Metrics
There are many classification algorithms available, however, one classifier that can be used for a problem domain with paramount accuracy is hard to find. Classification algorithm is a technique used to map data into known classes or outputs. A problem area that has seen a lot of application of classification algorithm is the Credit Card Fraud. Credit card fraud is not a new area that needs exploration but still there is scope to narrow down the best classification algorithm to rely upon to detect frauds in real time. In this paper, the focus is on investigating and determining which classification algorithm is the best one for detecting Credit Card Fraud through benchmark datasets. It has been found that Random Forest has the best accuracy when compared to other classifiers. The study would assist researchers in choosing the best classification scheme with the guideline provided for any credit card fraud dataset. The two datasets used in this research are imbalanced datasets, therefore, for better comparison of the algorithms, a balanced set is also used. The balancing of dataset is done through Synthetic Minority Oversampling Technique (SMOT). The comparison of results is done on 6 algorithms, namely, Random Forest, Logistic Regression, Neural Networks, Support Vector Machines (SVMs), Naive Bayes and K-Nearest Neighbor (KNN). The results are compared through two software; Weka and Python. The outcome of the experiment show that the methodology is indeed of great assistance in any practical applications.