Huanjing Wang, Qianxin Liang, John T. Hancock, T. Khoshgoftaar
{"title":"Enhancing Credit Card Fraud Detection Through a Novel Ensemble Feature Selection Technique","authors":"Huanjing Wang, Qianxin Liang, John T. Hancock, T. Khoshgoftaar","doi":"10.1109/IRI58017.2023.00028","DOIUrl":null,"url":null,"abstract":"Identifying fraudulent activities in credit card transactions is an inherent component of financial computing. The focus of our research is on the Credit Card Fraud Detection Dataset, which is widely used due to its authentic transaction data. In numerous machine learning applications, feature selection has become a crucial step. To improve the chance of discovering the globally optimal feature set, we employ ensembles of feature ranking methods. These ensemble methods merge multiple feature ranking lists through a median approach. We conduct a comprehensive empirical study that examines two different ensembles of feature ranking techniques, including an ensemble of twelve threshold-based feature selection (TBFS) techniques and an ensemble of five supervised feature selection (SFS) techniques. Additionally, we present results where all features are used. We construct classification models using two Decision Tree-based classifiers, CatBoost and XGBoost, and evaluate them using two different performance metrics, the Area Under the Receiver Operating Characteristic Curve (AUC) and the Area under the Precision-Recall Curve (AUPRC). Since AUPRC provides a more accurate representation of the number of false positives, especially for highly imbalanced datasets, evaluating models for AUPRC is a wise choice. The experimental results demonstrate that the ensemble of SFS and all features performs similarly or better than the ensemble of TBFS. Moreover, we find that XGBoost outperforms CatBoost in terms of AUPRC.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI58017.2023.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Identifying fraudulent activities in credit card transactions is an inherent component of financial computing. The focus of our research is on the Credit Card Fraud Detection Dataset, which is widely used due to its authentic transaction data. In numerous machine learning applications, feature selection has become a crucial step. To improve the chance of discovering the globally optimal feature set, we employ ensembles of feature ranking methods. These ensemble methods merge multiple feature ranking lists through a median approach. We conduct a comprehensive empirical study that examines two different ensembles of feature ranking techniques, including an ensemble of twelve threshold-based feature selection (TBFS) techniques and an ensemble of five supervised feature selection (SFS) techniques. Additionally, we present results where all features are used. We construct classification models using two Decision Tree-based classifiers, CatBoost and XGBoost, and evaluate them using two different performance metrics, the Area Under the Receiver Operating Characteristic Curve (AUC) and the Area under the Precision-Recall Curve (AUPRC). Since AUPRC provides a more accurate representation of the number of false positives, especially for highly imbalanced datasets, evaluating models for AUPRC is a wise choice. The experimental results demonstrate that the ensemble of SFS and all features performs similarly or better than the ensemble of TBFS. Moreover, we find that XGBoost outperforms CatBoost in terms of AUPRC.