基于图的欺诈检测抗噪声模型

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-05-02 DOI:10.1016/j.ipm.2025.104198

Zhengyang Liu, Hang Yu, Xiangfeng Luo

{"title":"基于图的欺诈检测抗噪声模型","authors":"Zhengyang Liu, Hang Yu, Xiangfeng Luo","doi":"10.1016/j.ipm.2025.104198","DOIUrl":null,"url":null,"abstract":"<div><div>Graph-based fraud detection is a critical task that identifies anomalous nodes that deviate from the majority of normal nodes within a graph. It can be applied in various practical situations, including but not limited to fake review detection, fraud transaction detection, and bot account detection. Current graph fraud detection models leverage popular Graph Neural Networks (GNNs) as their foundation, achieving significant success from the view of homogeneous and heterogeneous edges. However, these methods assume a sufficient proportion of completely accurate labeled nodes, overlooking the issue of noisy labels present in real-world scenarios. This can lead to significant performance degradation of current graph fraud detection methods. To address this challenge, we propose a Noise-Resistant Model for Graph Fraud Detection. First, we design a foundational graph fraud detection model from a spectral perspective to capture both homogeneous and heterogeneous information of nodes. Based on a conditional variational autoencoder(CVAE), we are able to obtain node features augmented from different perspectives. Next, nodes with noisy labels are trained alongside nodes with clean labels. Utilizing a self-supervised approach, noisy nodes with high prediction confidence that align with their labels are gradually incorporated to the training set. For nodes with lower confidence, we aim to learn better representations and gradually include more of them into the training set. With the augmented features generated by the CVAE, combined with a support set constructed from clean labels, we compute the consistency loss with adversarial strategies to ensure that features augmented from both normal and anomalous perspectives are brought closer to the relevant categories within the support set. Extensive experiments comparing our method with twelve state-of-the-art baselines on six real-world datasets – Amazon, Yelp, Elliptic, FDCompCN, T-Finance, and T-Social – showcase the superiority of our model.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 5","pages":"Article 104198"},"PeriodicalIF":6.9000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Noise-Resistant Model for Graph-based Fraud Detection\",\"authors\":\"Zhengyang Liu, Hang Yu, Xiangfeng Luo\",\"doi\":\"10.1016/j.ipm.2025.104198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Graph-based fraud detection is a critical task that identifies anomalous nodes that deviate from the majority of normal nodes within a graph. It can be applied in various practical situations, including but not limited to fake review detection, fraud transaction detection, and bot account detection. Current graph fraud detection models leverage popular Graph Neural Networks (GNNs) as their foundation, achieving significant success from the view of homogeneous and heterogeneous edges. However, these methods assume a sufficient proportion of completely accurate labeled nodes, overlooking the issue of noisy labels present in real-world scenarios. This can lead to significant performance degradation of current graph fraud detection methods. To address this challenge, we propose a Noise-Resistant Model for Graph Fraud Detection. First, we design a foundational graph fraud detection model from a spectral perspective to capture both homogeneous and heterogeneous information of nodes. Based on a conditional variational autoencoder(CVAE), we are able to obtain node features augmented from different perspectives. Next, nodes with noisy labels are trained alongside nodes with clean labels. Utilizing a self-supervised approach, noisy nodes with high prediction confidence that align with their labels are gradually incorporated to the training set. For nodes with lower confidence, we aim to learn better representations and gradually include more of them into the training set. With the augmented features generated by the CVAE, combined with a support set constructed from clean labels, we compute the consistency loss with adversarial strategies to ensure that features augmented from both normal and anomalous perspectives are brought closer to the relevant categories within the support set. Extensive experiments comparing our method with twelve state-of-the-art baselines on six real-world datasets – Amazon, Yelp, Elliptic, FDCompCN, T-Finance, and T-Social – showcase the superiority of our model.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"62 5\",\"pages\":\"Article 104198\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325001396\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325001396","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

基于图的欺诈检测是识别偏离图中大多数正常节点的异常节点的关键任务。它可以应用于各种实际情况，包括但不限于虚假评论检测、欺诈交易检测、bot账户检测。当前的图欺诈检测模型利用流行的图神经网络（gnn）作为基础，从同质和异构边缘的角度来看取得了重大成功。然而，这些方法假设了足够比例的完全准确的标记节点，忽略了现实场景中存在的噪声标签问题。这可能导致当前图形欺诈检测方法的显著性能下降。为了解决这一挑战，我们提出了一种抗噪声的图欺诈检测模型。首先，我们从光谱的角度设计了一个基本的图欺诈检测模型，以捕获节点的同质和异构信息。基于条件变分自编码器（CVAE），我们能够从不同的角度获得增强的节点特征。接下来，带噪声标签的节点与带干净标签的节点一起训练。利用自监督方法，将具有高预测置信度且与其标签对齐的噪声节点逐渐纳入训练集。对于置信度较低的节点，我们的目标是学习更好的表示，并逐渐将更多的节点包含到训练集中。利用CVAE生成的增强特征，结合由干净标签构建的支持集，我们使用对抗策略计算一致性损失，以确保从正常和异常角度增强的特征更接近支持集中的相关类别。大量的实验将我们的方法与六个真实世界数据集（Amazon, Yelp, Elliptic, FDCompCN， T-Finance和T-Social）上的12个最先进的基线进行了比较，展示了我们模型的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Noise-Resistant Model for Graph-based Fraud Detection

Graph-based fraud detection is a critical task that identifies anomalous nodes that deviate from the majority of normal nodes within a graph. It can be applied in various practical situations, including but not limited to fake review detection, fraud transaction detection, and bot account detection. Current graph fraud detection models leverage popular Graph Neural Networks (GNNs) as their foundation, achieving significant success from the view of homogeneous and heterogeneous edges. However, these methods assume a sufficient proportion of completely accurate labeled nodes, overlooking the issue of noisy labels present in real-world scenarios. This can lead to significant performance degradation of current graph fraud detection methods. To address this challenge, we propose a Noise-Resistant Model for Graph Fraud Detection. First, we design a foundational graph fraud detection model from a spectral perspective to capture both homogeneous and heterogeneous information of nodes. Based on a conditional variational autoencoder(CVAE), we are able to obtain node features augmented from different perspectives. Next, nodes with noisy labels are trained alongside nodes with clean labels. Utilizing a self-supervised approach, noisy nodes with high prediction confidence that align with their labels are gradually incorporated to the training set. For nodes with lower confidence, we aim to learn better representations and gradually include more of them into the training set. With the augmented features generated by the CVAE, combined with a support set constructed from clean labels, we compute the consistency loss with adversarial strategies to ensure that features augmented from both normal and anomalous perspectives are brought closer to the relevant categories within the support set. Extensive experiments comparing our method with twelve state-of-the-art baselines on six real-world datasets – Amazon, Yelp, Elliptic, FDCompCN, T-Finance, and T-Social – showcase the superiority of our model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.