面向图异常检测的多面一致性数据增强

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-08-25 DOI:10.1016/j.ipm.2025.104338

Tairan Huang, Yili Wang, Qiutong Li, Jianliang Gao

{"title":"面向图异常检测的多面一致性数据增强","authors":"Tairan Huang, Yili Wang, Qiutong Li, Jianliang Gao","doi":"10.1016/j.ipm.2025.104338","DOIUrl":null,"url":null,"abstract":"<div><div>Graph-based anomaly detection has become a prominent research area, driven by its critical applications in domains such as fraud detection, financial security, and biomedicine. However, existing methods encounter significant challenges, including label imbalance, feature camouflage, and limited supervision. In this paper, we propose McGAD, a method that incorporates two facets of consistency data augmentation: Structural Consistency Augmentation and Learnable Unsupervised Consistency Augmentation. Specifically, we use a heat wavelet diffusion pattern in structural consistency augmentation to capture the spectral graph wavelets of the nodes and treat the wavelets as probability distributions. McGAD uses an empirical characteristic function to convert wavelets into low-dimensional embeddings to indicate the neighborhood of each node. The nodes with high structural consistency of neighborhoods will have similar structural embeddings, even if they are far away from each other, which we provide a mathematical proof. This provides more effective embedding information for structurally consistent nodes of the same class, which can better solve label imbalance and feature camouflage problems. Moreover, we design a learnable unsupervised consistency augmentation module to handle the case of limited supervision. We make the whole augmentation process learnable, which enables the model to fully exploit the information from unlabeled nodes. We conduct extensive experiments on four benchmark datasets to demonstrate the superiority of McGAD. In particular, with only 1% training ratio, McGAD achieves 93.67% AUC and 91.81% F1-Macro on Amazon, outperforming 15 state-of-the-art baselines by up to 4.90% AUC and 3.75% F1-Macro.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104338"},"PeriodicalIF":6.9000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-faceted consistency data augmentation for graph anomaly detection\",\"authors\":\"Tairan Huang, Yili Wang, Qiutong Li, Jianliang Gao\",\"doi\":\"10.1016/j.ipm.2025.104338\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Graph-based anomaly detection has become a prominent research area, driven by its critical applications in domains such as fraud detection, financial security, and biomedicine. However, existing methods encounter significant challenges, including label imbalance, feature camouflage, and limited supervision. In this paper, we propose McGAD, a method that incorporates two facets of consistency data augmentation: Structural Consistency Augmentation and Learnable Unsupervised Consistency Augmentation. Specifically, we use a heat wavelet diffusion pattern in structural consistency augmentation to capture the spectral graph wavelets of the nodes and treat the wavelets as probability distributions. McGAD uses an empirical characteristic function to convert wavelets into low-dimensional embeddings to indicate the neighborhood of each node. The nodes with high structural consistency of neighborhoods will have similar structural embeddings, even if they are far away from each other, which we provide a mathematical proof. This provides more effective embedding information for structurally consistent nodes of the same class, which can better solve label imbalance and feature camouflage problems. Moreover, we design a learnable unsupervised consistency augmentation module to handle the case of limited supervision. We make the whole augmentation process learnable, which enables the model to fully exploit the information from unlabeled nodes. We conduct extensive experiments on four benchmark datasets to demonstrate the superiority of McGAD. In particular, with only 1% training ratio, McGAD achieves 93.67% AUC and 91.81% F1-Macro on Amazon, outperforming 15 state-of-the-art baselines by up to 4.90% AUC and 3.75% F1-Macro.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 1\",\"pages\":\"Article 104338\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325002791\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002791","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

基于图的异常检测在欺诈检测、金融安全、生物医学等领域的重要应用推动下，已成为一个重要的研究领域。然而，现有的方法面临着显著的挑战，包括标签不平衡、特征伪装和有限的监督。在本文中，我们提出了McGAD方法，它结合了一致性数据增强的两个方面：结构一致性增强和可学习无监督一致性增强。具体来说，我们在结构一致性增强中使用热小波扩散模式来捕获节点的谱图小波，并将小波作为概率分布处理。McGAD使用经验特征函数将小波转换为低维嵌入来表示每个节点的邻域。邻域结构一致性高的节点即使彼此相距较远，也会有相似的结构嵌入，并给出了数学证明。这为结构一致的同类节点提供了更有效的嵌入信息，可以更好地解决标签不平衡和特征伪装问题。此外，我们设计了一个可学习的无监督一致性增强模块来处理有限监督的情况。我们使整个增强过程可学习，这使得模型能够充分利用来自未标记节点的信息。我们在四个基准数据集上进行了大量实验，以证明McGAD的优越性。特别是，仅以1%的训练率，McGAD在亚马逊上实现了93.67%的AUC和91.81%的F1-Macro，比15个最先进的基线高出4.90%的AUC和3.75%的F1-Macro。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-faceted consistency data augmentation for graph anomaly detection

Graph-based anomaly detection has become a prominent research area, driven by its critical applications in domains such as fraud detection, financial security, and biomedicine. However, existing methods encounter significant challenges, including label imbalance, feature camouflage, and limited supervision. In this paper, we propose McGAD, a method that incorporates two facets of consistency data augmentation: Structural Consistency Augmentation and Learnable Unsupervised Consistency Augmentation. Specifically, we use a heat wavelet diffusion pattern in structural consistency augmentation to capture the spectral graph wavelets of the nodes and treat the wavelets as probability distributions. McGAD uses an empirical characteristic function to convert wavelets into low-dimensional embeddings to indicate the neighborhood of each node. The nodes with high structural consistency of neighborhoods will have similar structural embeddings, even if they are far away from each other, which we provide a mathematical proof. This provides more effective embedding information for structurally consistent nodes of the same class, which can better solve label imbalance and feature camouflage problems. Moreover, we design a learnable unsupervised consistency augmentation module to handle the case of limited supervision. We make the whole augmentation process learnable, which enables the model to fully exploit the information from unlabeled nodes. We conduct extensive experiments on four benchmark datasets to demonstrate the superiority of McGAD. In particular, with only 1% training ratio, McGAD achieves 93.67% AUC and 91.81% F1-Macro on Amazon, outperforming 15 state-of-the-art baselines by up to 4.90% AUC and 3.75% F1-Macro.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.