基于gan的交易欺诈检测混合抽样方法

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-16 DOI:10.1109/TKDE.2025.3589885

Yu Xie;Junkai Shan;Lifei Wei;Jiamin Yao;MengChu Zhou

{"title":"基于gan的交易欺诈检测混合抽样方法","authors":"Yu Xie;Junkai Shan;Lifei Wei;Jiamin Yao;MengChu Zhou","doi":"10.1109/TKDE.2025.3589885","DOIUrl":null,"url":null,"abstract":"In the digital era, effective Transaction Fraud Detection (TFD) is essential to ensuring financial security. The considerable class imbalance, with legitimate transactions vastly outnumbering fraudulent ones, presents a significant challenge for TFD models to accurately identify fraudulent patterns. While existing sample-balancing strategies address class imbalance effectively in many contexts, they often fall short in TFD due to fraudsters’ sophisticated concealment tactics, which lead to pronounced behavioral overlap between fraudulent and legitimate transactions. In this paper, we introduce a novel Generative Adversarial Network-based Hybrid Sampling method (GANHS) to effectively address the class imbalance issue. GANHS employs a dual-discriminator generative adversarial network to generate synthetic samples that accurately reflect the characteristics of fraudulent activity, while an adaptive neighborhood-based undersampling technique refines these samples to minimize overlap with legitimate ones. This hybrid approach not only enhances the model’s ability to learn fraud patterns by generating high-quality samples but also improves its resilience against highly concealed fraudulent activities. Experiments on real-world and public datasets demonstrate that GANHS outperforms its competitive peers, with gains of 0.5%–8.7% in average <inline-formula><tex-math>$F_{1}$</tex-math></inline-formula>-Score and 1.0%–7.0% in G-mean, highlighting its strong potential for improving the reliability and effectiveness of TFD systems in complex, high-risk financial scenarios.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5905-5918"},"PeriodicalIF":10.4000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GAN-Based Hybrid Sampling Method for Transaction Fraud Detection\",\"authors\":\"Yu Xie;Junkai Shan;Lifei Wei;Jiamin Yao;MengChu Zhou\",\"doi\":\"10.1109/TKDE.2025.3589885\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the digital era, effective Transaction Fraud Detection (TFD) is essential to ensuring financial security. The considerable class imbalance, with legitimate transactions vastly outnumbering fraudulent ones, presents a significant challenge for TFD models to accurately identify fraudulent patterns. While existing sample-balancing strategies address class imbalance effectively in many contexts, they often fall short in TFD due to fraudsters’ sophisticated concealment tactics, which lead to pronounced behavioral overlap between fraudulent and legitimate transactions. In this paper, we introduce a novel Generative Adversarial Network-based Hybrid Sampling method (GANHS) to effectively address the class imbalance issue. GANHS employs a dual-discriminator generative adversarial network to generate synthetic samples that accurately reflect the characteristics of fraudulent activity, while an adaptive neighborhood-based undersampling technique refines these samples to minimize overlap with legitimate ones. This hybrid approach not only enhances the model’s ability to learn fraud patterns by generating high-quality samples but also improves its resilience against highly concealed fraudulent activities. Experiments on real-world and public datasets demonstrate that GANHS outperforms its competitive peers, with gains of 0.5%–8.7% in average <inline-formula><tex-math>$F_{1}$</tex-math></inline-formula>-Score and 1.0%–7.0% in G-mean, highlighting its strong potential for improving the reliability and effectiveness of TFD systems in complex, high-risk financial scenarios.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 10\",\"pages\":\"5905-5918\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11081459/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11081459/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在数字时代，有效的交易欺诈检测（TFD）对于确保金融安全至关重要。由于合法交易的数量远远超过欺诈交易，严重的类别不平衡对TFD模型准确识别欺诈模式提出了重大挑战。虽然现有的样本平衡策略在许多情况下都能有效地解决类别不平衡问题，但由于欺诈者采用了复杂的隐藏策略，导致欺诈交易和合法交易之间明显的行为重叠，因此在TFD中往往效果不佳。本文提出了一种新的基于生成对抗网络的混合采样方法（GANHS）来有效地解决类不平衡问题。GANHS采用双鉴别器生成对抗网络来生成准确反映欺诈活动特征的合成样本，而基于自适应邻域的欠采样技术则对这些样本进行优化，以最大限度地减少与合法样本的重叠。这种混合方法不仅通过生成高质量的样本提高了模型学习欺诈模式的能力，而且还提高了模型对高度隐蔽的欺诈活动的弹性。在真实世界和公共数据集上的实验表明，GANHS优于其竞争对手，平均$ f_bb_0 $-Score的收益为0.5%-8.7%，G-mean的收益为1.0%-7.0%，突出了其在复杂，高风险金融场景中提高TFD系统可靠性和有效性的强大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GAN-Based Hybrid Sampling Method for Transaction Fraud Detection

In the digital era, effective Transaction Fraud Detection (TFD) is essential to ensuring financial security. The considerable class imbalance, with legitimate transactions vastly outnumbering fraudulent ones, presents a significant challenge for TFD models to accurately identify fraudulent patterns. While existing sample-balancing strategies address class imbalance effectively in many contexts, they often fall short in TFD due to fraudsters’ sophisticated concealment tactics, which lead to pronounced behavioral overlap between fraudulent and legitimate transactions. In this paper, we introduce a novel Generative Adversarial Network-based Hybrid Sampling method (GANHS) to effectively address the class imbalance issue. GANHS employs a dual-discriminator generative adversarial network to generate synthetic samples that accurately reflect the characteristics of fraudulent activity, while an adaptive neighborhood-based undersampling technique refines these samples to minimize overlap with legitimate ones. This hybrid approach not only enhances the model’s ability to learn fraud patterns by generating high-quality samples but also improves its resilience against highly concealed fraudulent activities. Experiments on real-world and public datasets demonstrate that GANHS outperforms its competitive peers, with gains of 0.5%–8.7% in average

$F_{1}$

-Score and 1.0%–7.0% in G-mean, highlighting its strong potential for improving the reliability and effectiveness of TFD systems in complex, high-risk financial scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.