{"title":"预测不平衡数据的机器学习:信用欺诈检测","authors":"Thanh Cong Tran, T. K. Dang","doi":"10.1109/IMCOM51814.2021.9377352","DOIUrl":null,"url":null,"abstract":"Online transactions have increased drastically over the past decades. Credit card transactions account for a large percentage of these transactions. This leads to rise activities of credit card fraud transactions, causing losses in the finance industry. Therefore, it is vital to create reliable fraud detection systems, including two labels of fraud and no-fraud. However, there are highly unbalanced data between these two labels. In this paper, we use two resampling approaches of synthetic minority oversampling technique (SMOTE) and adaptive synthetic (ADASYN) to handle an imbalanced dataset to obtain the balanced dataset. The machine learning (ML) algorithms, named random forest, k nearest neighbors, decision tree, and logistic regression are applied to this balanced dataset. The comprehensive classification measurements, including fundamental, combined, and graphical measurements are used to evaluate the performances of these models. We observe that after resampling the dataset, the ML algorithms mentioned show the positive results of classification for fraudulent activities.","PeriodicalId":275121,"journal":{"name":"2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Machine Learning for Prediction of Imbalanced Data: Credit Fraud Detection\",\"authors\":\"Thanh Cong Tran, T. K. Dang\",\"doi\":\"10.1109/IMCOM51814.2021.9377352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online transactions have increased drastically over the past decades. Credit card transactions account for a large percentage of these transactions. This leads to rise activities of credit card fraud transactions, causing losses in the finance industry. Therefore, it is vital to create reliable fraud detection systems, including two labels of fraud and no-fraud. However, there are highly unbalanced data between these two labels. In this paper, we use two resampling approaches of synthetic minority oversampling technique (SMOTE) and adaptive synthetic (ADASYN) to handle an imbalanced dataset to obtain the balanced dataset. The machine learning (ML) algorithms, named random forest, k nearest neighbors, decision tree, and logistic regression are applied to this balanced dataset. The comprehensive classification measurements, including fundamental, combined, and graphical measurements are used to evaluate the performances of these models. We observe that after resampling the dataset, the ML algorithms mentioned show the positive results of classification for fraudulent activities.\",\"PeriodicalId\":275121,\"journal\":{\"name\":\"2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCOM51814.2021.9377352\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM51814.2021.9377352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine Learning for Prediction of Imbalanced Data: Credit Fraud Detection
Online transactions have increased drastically over the past decades. Credit card transactions account for a large percentage of these transactions. This leads to rise activities of credit card fraud transactions, causing losses in the finance industry. Therefore, it is vital to create reliable fraud detection systems, including two labels of fraud and no-fraud. However, there are highly unbalanced data between these two labels. In this paper, we use two resampling approaches of synthetic minority oversampling technique (SMOTE) and adaptive synthetic (ADASYN) to handle an imbalanced dataset to obtain the balanced dataset. The machine learning (ML) algorithms, named random forest, k nearest neighbors, decision tree, and logistic regression are applied to this balanced dataset. The comprehensive classification measurements, including fundamental, combined, and graphical measurements are used to evaluate the performances of these models. We observe that after resampling the dataset, the ML algorithms mentioned show the positive results of classification for fraudulent activities.