Enhancing Credit Card Fraud Detection Through a Novel Ensemble Feature Selection Technique

2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI) Pub Date : 2023-08-01 DOI:10.1109/IRI58017.2023.00028

Huanjing Wang, Qianxin Liang, John T. Hancock, T. Khoshgoftaar

{"title":"Enhancing Credit Card Fraud Detection Through a Novel Ensemble Feature Selection Technique","authors":"Huanjing Wang, Qianxin Liang, John T. Hancock, T. Khoshgoftaar","doi":"10.1109/IRI58017.2023.00028","DOIUrl":null,"url":null,"abstract":"Identifying fraudulent activities in credit card transactions is an inherent component of financial computing. The focus of our research is on the Credit Card Fraud Detection Dataset, which is widely used due to its authentic transaction data. In numerous machine learning applications, feature selection has become a crucial step. To improve the chance of discovering the globally optimal feature set, we employ ensembles of feature ranking methods. These ensemble methods merge multiple feature ranking lists through a median approach. We conduct a comprehensive empirical study that examines two different ensembles of feature ranking techniques, including an ensemble of twelve threshold-based feature selection (TBFS) techniques and an ensemble of five supervised feature selection (SFS) techniques. Additionally, we present results where all features are used. We construct classification models using two Decision Tree-based classifiers, CatBoost and XGBoost, and evaluate them using two different performance metrics, the Area Under the Receiver Operating Characteristic Curve (AUC) and the Area under the Precision-Recall Curve (AUPRC). Since AUPRC provides a more accurate representation of the number of false positives, especially for highly imbalanced datasets, evaluating models for AUPRC is a wise choice. The experimental results demonstrate that the ensemble of SFS and all features performs similarly or better than the ensemble of TBFS. Moreover, we find that XGBoost outperforms CatBoost in terms of AUPRC.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI58017.2023.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Identifying fraudulent activities in credit card transactions is an inherent component of financial computing. The focus of our research is on the Credit Card Fraud Detection Dataset, which is widely used due to its authentic transaction data. In numerous machine learning applications, feature selection has become a crucial step. To improve the chance of discovering the globally optimal feature set, we employ ensembles of feature ranking methods. These ensemble methods merge multiple feature ranking lists through a median approach. We conduct a comprehensive empirical study that examines two different ensembles of feature ranking techniques, including an ensemble of twelve threshold-based feature selection (TBFS) techniques and an ensemble of five supervised feature selection (SFS) techniques. Additionally, we present results where all features are used. We construct classification models using two Decision Tree-based classifiers, CatBoost and XGBoost, and evaluate them using two different performance metrics, the Area Under the Receiver Operating Characteristic Curve (AUC) and the Area under the Precision-Recall Curve (AUPRC). Since AUPRC provides a more accurate representation of the number of false positives, especially for highly imbalanced datasets, evaluating models for AUPRC is a wise choice. The experimental results demonstrate that the ensemble of SFS and all features performs similarly or better than the ensemble of TBFS. Moreover, we find that XGBoost outperforms CatBoost in terms of AUPRC.

查看原文本刊更多论文

通过一种新的集成特征选择技术增强信用卡欺诈检测

识别信用卡交易中的欺诈活动是金融计算的固有组成部分。我们的研究重点是信用卡欺诈检测数据集，该数据集因其真实的交易数据而被广泛使用。在许多机器学习应用中，特征选择已经成为关键的一步。为了提高发现全局最优特征集的机会，我们采用了特征排序方法的集合。这些集成方法通过中值方法合并多个特征排序列表。我们进行了一项全面的实证研究，研究了两种不同的特征排序技术，包括12种基于阈值的特征选择(TBFS)技术的集成和5种监督特征选择(SFS)技术的集成。此外，我们还提供了使用所有特征的结果。我们使用两种基于决策树的分类器CatBoost和XGBoost构建分类模型，并使用两个不同的性能指标，即接收者工作特征曲线下面积(AUC)和精确召回率曲线下面积(AUPRC)对它们进行评估。由于AUPRC提供了更准确的假阳性数量表示，特别是对于高度不平衡的数据集，因此评估AUPRC模型是一个明智的选择。实验结果表明，SFS和所有特征的集成性能与TBFS的集成相似或更好。此外，我们发现XGBoost在AUPRC方面优于CatBoost。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)

自引率

0.00%

发文量